import pandas as pd
import numpy as np
import altair as alt
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
customer_df = pd.read_csv('final/customers.csv', sep='|')
customer_df.head()
| ssn | cc_num | first | last | gender | street | city | state | zip | lat | long | city_pop | job | dob | acct_num | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 |
| 1 | 715-55-5575 | 4351161559407816183 | Elaine | Fuller | F | 310 Kendra Common Apt. 164 | Leland | NC | 28451 | 34.2680 | -78.0578 | 27112 | Professor Emeritus | 1963-06-07 | 917558277935 |
| 2 | 167-48-5821 | 4192832764832 | Michael | Cameron | M | 05641 Robin Port | Cordova | SC | 29039 | 33.4275 | -80.8857 | 4215 | International aid/development worker | 1973-05-30 | 718172762479 |
| 3 | 406-83-7518 | 4238849696532874 | Brandon | Williams | M | 26916 Carlson Mountain | Birmingham | AL | 35242 | 33.3813 | -86.7046 | 493806 | Seismic interpreter | 1942-12-26 | 947268892251 |
| 4 | 697-93-1877 | 4514627048281480 | Lisa | Hernandez | F | 809 Burns Creek | Fargo | GA | 31631 | 30.7166 | -82.5801 | 559 | Medical laboratory scientific officer | 1939-05-22 | 888335239225 |
customer_df['acct_num'].nunique()
1000
customer_df.shape
(1000, 15)
customer_df.columns
Index(['ssn', 'cc_num', 'first', 'last', 'gender', 'street', 'city', 'state',
'zip', 'lat', 'long', 'city_pop', 'job', 'dob', 'acct_num'],
dtype='object')
import os
directory = './final'
dfs = []
for filename in os.listdir(directory):
if filename.startswith('transactions'):
print(f'Reading {filename}...')
filepath = os.path.join(directory, filename)
df = pd.read_csv(filepath, sep='|')
dfs.append(df)
else:
print(f'{filename} is not a CSV file. Skipping...')
if len(dfs) > 0:
transaction_df = pd.concat(dfs, ignore_index=True)
print(f'Successfully merged {len(dfs)} dataframes into one.')
else:
print('No CSV files found in directory.')
Reading transactions_12.csv... Reading transactions_126.csv... Reading transactions_127.csv... Reading transactions_13.csv... customers.csv is not a CSV file. Skipping... Reading transactions_39.csv... Reading transactions_11.csv... Reading transactions_119.csv... Reading transactions_131.csv... Reading transactions_125.csv... Reading transactions_124.csv... Reading transactions_130.csv... Reading transactions_118.csv... Reading transactions_10.csv... Reading transactions_38.csv... Reading transactions_14.csv... Reading transactions_28.csv... Reading transactions_120.csv... Reading transactions_108.csv... Reading transactions_109.csv... Reading transactions_121.csv... Reading transactions_29.csv... Reading transactions_15.csv... Reading transactions_17.csv... Reading transactions_123.csv... Reading transactions_122.csv... Reading transactions_16.csv... Reading transactions_59.csv... Reading transactions_65.csv... Reading transactions_71.csv... Reading transactions_7.csv... Reading transactions_6.csv... Reading transactions_70.csv... Reading transactions_64.csv... Reading transactions_58.csv... Reading transactions_72.csv... Reading transactions_4.csv... Reading transactions_66.csv... Reading transactions_99.csv... Reading transactions_98.csv... Reading transactions_67.csv... Reading transactions_5.csv... Reading transactions_73.csv... Reading transactions_77.csv... Reading transactions_1.csv... Reading transactions_63.csv... Reading transactions_88.csv... Reading transactions_89.csv... Reading transactions_62.csv... Reading transactions_0.csv... Reading transactions_76.csv... Reading transactions_60.csv... Reading transactions_74.csv... Reading transactions_2.csv... Reading transactions_48.csv... Reading transactions_49.csv... Reading transactions_3.csv... Reading transactions_75.csv... Reading transactions_61.csv... Reading transactions_78.csv... Reading transactions_44.csv... Reading transactions_50.csv... Reading transactions_87.csv... Reading transactions_93.csv... Reading transactions_92.csv... Reading transactions_86.csv... Reading transactions_51.csv... Reading transactions_45.csv... Reading transactions_79.csv... Reading transactions_53.csv... Reading transactions_47.csv... Reading transactions_90.csv... Reading transactions_84.csv... Reading transactions_85.csv... Reading transactions_91.csv... Reading transactions_46.csv... Reading transactions_52.csv... Reading transactions_56.csv... Reading transactions_42.csv... Reading transactions_8.csv... Reading transactions_95.csv... Reading transactions_81.csv... Reading transactions_80.csv... Reading transactions_94.csv... Reading transactions_9.csv... Reading transactions_43.csv... Reading transactions_57.csv... Reading transactions_41.csv... Reading transactions_55.csv... Reading transactions_69.csv... Reading transactions_82.csv... Reading transactions_96.csv... Reading transactions_97.csv... Reading transactions_83.csv... Reading transactions_68.csv... Reading transactions_54.csv... Reading transactions_40.csv... Reading transactions_27.csv... Reading transactions_33.csv... Reading transactions_107.csv... Reading transactions_113.csv... Reading transactions_112.csv... Reading transactions_106.csv... Reading transactions_32.csv... Reading transactions_26.csv... Reading transactions_18.csv... Reading transactions_30.csv... Reading transactions_24.csv... Reading transactions_110.csv... Reading transactions_104.csv... Reading transactions_105.csv... Reading transactions_111.csv... Reading transactions_25.csv... Reading transactions_31.csv... Reading transactions_19.csv... Reading transactions_35.csv... Reading transactions_21.csv... Reading transactions_115.csv... Reading transactions_101.csv... Reading transactions_129.csv... Reading transactions_128.csv... Reading transactions_100.csv... Reading transactions_114.csv... Reading transactions_20.csv... Reading transactions_34.csv... Reading transactions_22.csv... Reading transactions_36.csv... Reading transactions_102.csv... Reading transactions_116.csv... Reading transactions_117.csv... Reading transactions_103.csv... Reading transactions_37.csv... Reading transactions_23.csv... Successfully merged 132 dataframes into one.
transaction_df.head()
| cc_num | acct_num | trans_num | unix_time | category | amt | is_fraud | merchant | merch_lat | merch_long | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 4896331812335761701 | 149852234418 | f3ec0819590302134f03ffdc2f44697f | 1646060228 | gas_transport | 65.17 | 0 | Larson, Ryan and Huang | 38.143430 | -90.327335 |
| 1 | 4896331812335761701 | 149852234418 | c1607c993e41f2c3b42d72d1506bef7b | 1644848624 | gas_transport | 47.58 | 0 | Myers-Reed | 39.119498 | -90.760379 |
| 2 | 4896331812335761701 | 149852234418 | 6f530db25d20fe351249a54491fd3fde | 1645632153 | gas_transport | 64.43 | 0 | Baker-Bullock | 39.384368 | -90.361517 |
| 3 | 4896331812335761701 | 149852234418 | 6d11805f2acd938fec99376001afafe8 | 1645311286 | gas_transport | 82.47 | 0 | Spencer-Hall | 39.443567 | -89.752400 |
| 4 | 4896331812335761701 | 149852234418 | 605342f297c575cb1ccf2c08cad082ee | 1641571926 | gas_transport | 50.28 | 0 | King, Rodriguez and Hancock | 38.857278 | -89.609525 |
transaction_df.shape
(4260904, 10)
transaction_df.columns
Index(['cc_num', 'acct_num', 'trans_num', 'unix_time', 'category', 'amt',
'is_fraud', 'merchant', 'merch_lat', 'merch_long'],
dtype='object')
transaction_df['acct_num'].nunique()
983
There were 1,000 unique cusomters in customer_df, while there were 983 unique customers in transaction_df
df = customer_df.merge(transaction_df, on=['cc_num', 'acct_num'])
df.head()
| ssn | cc_num | first | last | gender | street | city | state | zip | lat | long | city_pop | job | dob | acct_num | trans_num | unix_time | category | amt | is_fraud | merchant | merch_lat | merch_long | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 91ab12e73ef38206e1121e9648d2408d | 1558719550 | gas_transport | 69.12 | 0 | Phillips Group | 39.491416 | -75.588522 |
| 1 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 071553d533a6822a4431c354c434ddcb | 1569425519 | grocery_pos | 68.11 | 0 | Tucker Ltd | 40.890319 | -75.573359 |
| 2 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 0cfad38ef15e4749eff68dc83f62c151 | 1577205601 | misc_net | 40.35 | 0 | Dixon PLC | 39.244958 | -74.475327 |
| 3 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 5782693d7c70f062f258cb30bfa8900f | 1571428238 | grocery_pos | 96.22 | 0 | Lambert-Cooper | 39.656925 | -75.802342 |
| 4 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 35fd7db657d7e30dd608c37f7798186e | 1549840400 | gas_transport | 71.89 | 0 | Griffith LLC | 40.313342 | -74.220434 |
df['acct_num'].nunique()
983
df.shape
(4260904, 23)
df.columns
Index(['ssn', 'cc_num', 'first', 'last', 'gender', 'street', 'city', 'state',
'zip', 'lat', 'long', 'city_pop', 'job', 'dob', 'acct_num', 'trans_num',
'unix_time', 'category', 'amt', 'is_fraud', 'merchant', 'merch_lat',
'merch_long'],
dtype='object')
df.isna().sum()
ssn 0 cc_num 0 first 0 last 0 gender 0 street 0 city 0 state 0 zip 0 lat 0 long 0 city_pop 0 job 0 dob 0 acct_num 0 trans_num 0 unix_time 0 category 0 amt 0 is_fraud 0 merchant 0 merch_lat 0 merch_long 0 dtype: int64
df.duplicated().sum()
0
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 4260904 entries, 0 to 4260903 Data columns (total 23 columns): # Column Dtype --- ------ ----- 0 ssn object 1 cc_num object 2 first object 3 last object 4 gender object 5 street object 6 city object 7 state object 8 zip int64 9 lat float64 10 long float64 11 city_pop int64 12 job object 13 dob object 14 acct_num object 15 trans_num object 16 unix_time object 17 category object 18 amt float64 19 is_fraud object 20 merchant object 21 merch_lat float64 22 merch_long float64 dtypes: float64(5), int64(2), object(16) memory usage: 780.2+ MB
df['unix_time'].nunique()
4114752
df_cleaned = df.copy()
In the data preparation stage for regression analysis, we will undertake specific steps to ensure the dataset is appropriately prepared for the business case of predicting the next month’s spending. The following actions will be performed:
Conversion of Unix Time: The Unix Time values will be transformed into a more interpretable format called ‘trasn_month_year’. This conversion will provide the transaction month and year information, allowing us to analyze trends over time and establishes a chronological framework for predicting future spending.
Age Calculations: By utilizing the date of birth information (‘dob’), we will be able to calculate the age of each customer. Age can be a relevant factor affecting spending behaviour, and we will categorize the age variables into seven bins: <18, 18-24, 25-34, 35-44, 45-54, 55-64, and 65+, which will enable us to evaluate the impact of age on future transactions predictions.
Filtering out Fraudulent Transactions: To ensure the accuracy and reliability of our regression analysis, we will filter out fraudulent transactions from the dataset. By excluding the fraudulent transactions, we focus solely on the legitimate customers' spending patterns, which are crucial for predicting the next month’s spending.
Grouping and Aggregating Data: The dataset will be grouped by the 'trans_month_year' variable to examine monthly spending patterns. Transaction data will be aggregated by calculating the sum, mean, maximum, minimum, and count of transactions for each month. Additionally, the transaction data will be flattened to include separate columns for the total and number of transactions in each month. These aggregated statistics will provide valuable insights into customer spending behaviour over time and facilitate accurate predictions for the next month's spending.
Inclusion of Customer Demographics: To enrich the dataset, we will incorporate customer demographics such as gender, age, job, city, and age_grouped. These variables will provide additional insight into the factor that influence spending behaviour and enhance the accuracy of our regression model.
df_cleaned['dob'] = pd.to_datetime(df_cleaned['dob'])
df_cleaned['trans_date_time'] = pd.to_datetime(df_cleaned['unix_time'], unit='s')
df_cleaned['trans_month_year'] = pd.to_datetime(df_cleaned['trans_date_time']).dt.to_period('M')
df_cleaned['quarter'] = pd.to_datetime(df_cleaned['trans_date_time']).dt.to_period('Q')
df_cleaned['first_trans_month_year'] = df_cleaned.groupby('acct_num')['trans_month_year'].transform('min')
df_cleaned['first_trans_month_year'] = df_cleaned['first_trans_month_year'].dt.to_timestamp()
age = ((df_cleaned['first_trans_month_year'] - df_cleaned['dob']).dt.days / 365.25).apply(round)
df_cleaned['age'] = age
df_cleaned.head()
| ssn | cc_num | first | last | gender | street | city | state | zip | lat | long | city_pop | job | dob | acct_num | trans_num | unix_time | category | amt | is_fraud | merchant | merch_lat | merch_long | trans_date_time | trans_month_year | quarter | first_trans_month_year | age | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 91ab12e73ef38206e1121e9648d2408d | 1558719550 | gas_transport | 69.12 | 0 | Phillips Group | 39.491416 | -75.588522 | 2019-05-24 17:39:10 | 2019-05 | 2019Q2 | 2018-12-01 | 59 |
| 1 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 071553d533a6822a4431c354c434ddcb | 1569425519 | grocery_pos | 68.11 | 0 | Tucker Ltd | 40.890319 | -75.573359 | 2019-09-25 15:31:59 | 2019-09 | 2019Q3 | 2018-12-01 | 59 |
| 2 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 0cfad38ef15e4749eff68dc83f62c151 | 1577205601 | misc_net | 40.35 | 0 | Dixon PLC | 39.244958 | -74.475327 | 2019-12-24 16:40:01 | 2019-12 | 2019Q4 | 2018-12-01 | 59 |
| 3 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 5782693d7c70f062f258cb30bfa8900f | 1571428238 | grocery_pos | 96.22 | 0 | Lambert-Cooper | 39.656925 | -75.802342 | 2019-10-18 19:50:38 | 2019-10 | 2019Q4 | 2018-12-01 | 59 |
| 4 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 35fd7db657d7e30dd608c37f7798186e | 1549840400 | gas_transport | 71.89 | 0 | Griffith LLC | 40.313342 | -74.220434 | 2019-02-10 23:13:20 | 2019-02 | 2019Q1 | 2018-12-01 | 59 |
import seaborn as sns
import matplotlib.pyplot as plt
palette='ch:.25'
pred_trans_df = df_cleaned.copy()
There are 14 unique category
pred_trans_df['category'].nunique()
14
There are 939 unique zip code
pred_trans_df['zip'].nunique()
939
There are 726 unique city
pred_trans_df['city'].nunique()
726
There are 51 unique state
pred_trans_df['state'].nunique()
51
There are 505 unique job
pred_trans_df['job'].nunique()
505
pred_trans_df = pred_trans_df[pred_trans_df['is_fraud'] == 0]
pred_trans_df.shape
(4255870, 28)
pred_trans_df.columns
Index(['ssn', 'cc_num', 'first', 'last', 'gender', 'street', 'city', 'state',
'zip', 'lat', 'long', 'city_pop', 'job', 'dob', 'acct_num', 'trans_num',
'unix_time', 'category', 'amt', 'is_fraud', 'merchant', 'merch_lat',
'merch_long', 'trans_date_time', 'trans_month_year', 'quarter',
'first_trans_month_year', 'age'],
dtype='object')
pred_trans_df.head()
| ssn | cc_num | first | last | gender | street | city | state | zip | lat | long | city_pop | job | dob | acct_num | trans_num | unix_time | category | amt | is_fraud | merchant | merch_lat | merch_long | trans_date_time | trans_month_year | quarter | first_trans_month_year | age | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 91ab12e73ef38206e1121e9648d2408d | 1558719550 | gas_transport | 69.12 | 0 | Phillips Group | 39.491416 | -75.588522 | 2019-05-24 17:39:10 | 2019-05 | 2019Q2 | 2018-12-01 | 59 |
| 1 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 071553d533a6822a4431c354c434ddcb | 1569425519 | grocery_pos | 68.11 | 0 | Tucker Ltd | 40.890319 | -75.573359 | 2019-09-25 15:31:59 | 2019-09 | 2019Q3 | 2018-12-01 | 59 |
| 2 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 0cfad38ef15e4749eff68dc83f62c151 | 1577205601 | misc_net | 40.35 | 0 | Dixon PLC | 39.244958 | -74.475327 | 2019-12-24 16:40:01 | 2019-12 | 2019Q4 | 2018-12-01 | 59 |
| 3 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 5782693d7c70f062f258cb30bfa8900f | 1571428238 | grocery_pos | 96.22 | 0 | Lambert-Cooper | 39.656925 | -75.802342 | 2019-10-18 19:50:38 | 2019-10 | 2019Q4 | 2018-12-01 | 59 |
| 4 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 35fd7db657d7e30dd608c37f7798186e | 1549840400 | gas_transport | 71.89 | 0 | Griffith LLC | 40.313342 | -74.220434 | 2019-02-10 23:13:20 | 2019-02 | 2019Q1 | 2018-12-01 | 59 |
pred_trans_df['month'] = pred_trans_df['trans_month_year'].dt.month
pred_trans_df['year'] = pred_trans_df['trans_month_year'].dt.year
monthly_spending = pred_trans_df.groupby(['year', 'month']).agg({'amt': ['sum', 'mean', 'max', 'min'],
'trans_num': 'count'}).reset_index()
monthly_spending.columns = ['year', 'month', 'total_amt', 'mean_amt', 'max_amt', 'min_amt', 'trans_count']
monthly_spending.tail()
| year | month | total_amt | mean_amt | max_amt | min_amt | trans_count | |
|---|---|---|---|---|---|---|---|
| 44 | 2022 | 8 | 9810985.07 | 63.096803 | 24583.86 | 1.0 | 155491 |
| 45 | 2022 | 9 | 7837465.56 | 62.719293 | 16460.30 | 1.0 | 124961 |
| 46 | 2022 | 10 | 8217907.95 | 62.749845 | 25159.92 | 1.0 | 130963 |
| 47 | 2022 | 11 | 8046118.86 | 64.152884 | 23235.32 | 1.0 | 125421 |
| 48 | 2022 | 12 | 16418548.14 | 63.093177 | 23949.46 | 1.0 | 260227 |
sns.lineplot(data=monthly_spending, x='month', y='total_amt', hue='year', palette=palette)
plt.ylabel('Total Monthly Spending ($)')
plt.ticklabel_format(style='plain', axis='y')
plt.yticks(np.arange(0, 1.8e7, 0.2e7))
plt.title('Total Monthly Spending Across Customers')
plt.show()
It seems to be a seasonal trend of spending. The data shows a yearly cylce of higher spending during the months of November, December, and March. It's also evident that the spending in December is much higher than the other months. There could be many reasons behind, such as holidays, sales, or promotions.
monthly_spending['year_month'] = monthly_spending['year'].astype(str) + '-' + monthly_spending['month'].astype(str)
fig, ax = plt.subplots(figsize=(18, 6))
ax.plot(monthly_spending['year_month'], monthly_spending['total_amt'], color='black')
ax.set_xlabel('Year-Month')
ax.set_xlabel('Year-Month', fontsize=12)
ax.tick_params(axis='x', rotation=45)
ax.set_ylabel('Total Spending ($)', fontsize=12)
ax.yaxis.set_major_formatter('${:.0f}'.format)
ax.set_title('Total Monthly Spending', fontsize=14)
Text(0.5, 1.0, 'Total Monthly Spending')
fig, ax = plt.subplots(figsize=(18, 6))
ax2 = ax.twinx()
ax.plot(monthly_spending['year_month'], monthly_spending['total_amt'], color='black')
ax.set_xlabel('Year-Month', fontsize=12)
ax.tick_params(axis='x', rotation=45)
ax.set_ylabel('Total Spending ($)', fontsize=12)
ax.yaxis.set_major_formatter('${:.0f}'.format)
ax.set_title('Total Monthly Spending and Transactions', fontsize=14)
ax2.bar(monthly_spending['year_month'], monthly_spending['trans_count'], color='grey', alpha=0.5)
ax2.set_ylabel('Transaction Count', fontsize=12)
plt.show()
fig, ax = plt.subplots(figsize=(18, 6))
ax.plot(monthly_spending['year_month'], monthly_spending['total_amt'], color='black')
ax.axvspan('2019-11', '2020-1', color='red', alpha=0.1)
ax.axvspan('2020-11', '2021-1', color='red', alpha=0.1)
ax.axvspan('2021-11', '2022-1', color='red', alpha=0.1)
ax.axvspan('2022-11', '2022-12', color='red', alpha=0.1)
ax.set_xlabel('Year-Month', fontsize=12)
ax.tick_params(axis='x', rotation=45)
ax.set_ylabel('Total Spending ($)', fontsize=12)
ax.yaxis.set_major_formatter('${:.0f}'.format)
ax.set_title('Total Monthly Spending', fontsize=14)
plt.show()
fig, ax = plt.subplots(figsize=(18, 6))
ax.plot(monthly_spending['year_month'], monthly_spending['total_amt'], color='black')
ax2 = ax.twinx()
ax2.bar(monthly_spending['year_month'], monthly_spending['trans_count'], color='grey', alpha=0.5)
ax2.set_ylabel('Total Transactions', fontsize=12)
ax2.yaxis.set_major_formatter('{:.0f}'.format)
ax.axvspan('2019-11', '2020-1', color='red', alpha=0.1)
ax.axvspan('2020-11', '2021-1', color='red', alpha=0.1)
ax.axvspan('2021-11', '2022-1', color='red', alpha=0.1)
ax.axvspan('2022-11', '2022-12', color='red', alpha=0.1)
ax.set_xlabel('Year-Month', fontsize=12)
ax.tick_params(axis='x', rotation=45)
ax.set_ylabel('Total Spending ($)', fontsize=12)
ax.yaxis.set_major_formatter('${:.0f}'.format)
ax.legend(['Total Spending'], loc='upper left')
ax2.legend(['Total Transactions'], loc='upper right')
plt.title('Total Monthly Spending and Transactions', fontsize=14)
plt.show()
total_spent_per_month = pred_trans_df.groupby(['acct_num', 'trans_month_year']).agg({
'category': lambda x: x.value_counts().idxmax(),
'age': 'min',
'gender': 'first',
'job': 'first',
'city': 'first',
'state': 'first',
'zip': 'first',
'trans_num': 'count',
'amt': ['mean', 'max', 'min', 'sum']
}).reset_index()
total_spent_per_month.columns = ['acct_num', 'trans_month_year', 'category', 'age', 'gender', 'job', 'city', 'state', 'zip', 'trans_count', 'mean_amt', 'max_amt', 'min_amt', 'total_amt']
total_spent_per_month.head()
| acct_num | trans_month_year | category | age | gender | job | city | state | zip | trans_count | mean_amt | max_amt | min_amt | total_amt | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2348758451 | 2018-12 | gas_transport | 42 | M | Surveyor, minerals | Rochester | NY | 14621 | 1 | 96.050000 | 96.05 | 96.05 | 96.05 |
| 1 | 2348758451 | 2019-01 | gas_transport | 42 | M | Surveyor, minerals | Rochester | NY | 14621 | 40 | 78.377750 | 359.87 | 5.67 | 3135.11 |
| 2 | 2348758451 | 2019-02 | gas_transport | 42 | M | Surveyor, minerals | Rochester | NY | 14621 | 49 | 62.837143 | 245.29 | 4.41 | 3079.02 |
| 3 | 2348758451 | 2019-03 | gas_transport | 42 | M | Surveyor, minerals | Rochester | NY | 14621 | 57 | 54.632456 | 131.91 | 1.73 | 3114.05 |
| 4 | 2348758451 | 2019-04 | gas_transport | 42 | M | Surveyor, minerals | Rochester | NY | 14621 | 60 | 69.893500 | 1183.46 | 1.27 | 4193.61 |
fig, axes = plt.subplots(6, 1, figsize=(16, 18))
for i, c in enumerate(['total_amt', 'mean_amt', 'trans_count', 'age', 'gender', 'state']):
sns.histplot(data=total_spent_per_month[c], ax=axes[i], kde=True, color='orange')
plt.tight_layout()
plt.show()
customer_summary = pd.pivot_table(total_spent_per_month,
index=['acct_num'],
columns=['trans_month_year'],
values=['trans_count', 'total_amt'],
fill_value=0)
customer_summary_flat = pd.DataFrame(customer_summary.to_records())
customer_summary_flat.columns = [col.replace("('", "").replace("', '", "_").replace("'))", "")
for col in customer_summary_flat.columns]
customer_summary_flat.head()
| acct_num | total_amt', Period2018-12_M | total_amt', Period2019-01_M | total_amt', Period2019-02_M | total_amt', Period2019-03_M | total_amt', Period2019-04_M | total_amt', Period2019-05_M | total_amt', Period2019-06_M | total_amt', Period2019-07_M | total_amt', Period2019-08_M | total_amt', Period2019-09_M | total_amt', Period2019-10_M | total_amt', Period2019-11_M | total_amt', Period2019-12_M | total_amt', Period2020-01_M | total_amt', Period2020-02_M | total_amt', Period2020-03_M | total_amt', Period2020-04_M | total_amt', Period2020-05_M | total_amt', Period2020-06_M | total_amt', Period2020-07_M | total_amt', Period2020-08_M | total_amt', Period2020-09_M | total_amt', Period2020-10_M | total_amt', Period2020-11_M | total_amt', Period2020-12_M | total_amt', Period2021-01_M | total_amt', Period2021-02_M | total_amt', Period2021-03_M | total_amt', Period2021-04_M | total_amt', Period2021-05_M | total_amt', Period2021-06_M | total_amt', Period2021-07_M | total_amt', Period2021-08_M | total_amt', Period2021-09_M | total_amt', Period2021-10_M | total_amt', Period2021-11_M | total_amt', Period2021-12_M | total_amt', Period2022-01_M | total_amt', Period2022-02_M | total_amt', Period2022-03_M | total_amt', Period2022-04_M | total_amt', Period2022-05_M | total_amt', Period2022-06_M | total_amt', Period2022-07_M | total_amt', Period2022-08_M | total_amt', Period2022-09_M | total_amt', Period2022-10_M | total_amt', Period2022-11_M | total_amt', Period2022-12_M | trans_count', Period2018-12_M | trans_count', Period2019-01_M | trans_count', Period2019-02_M | trans_count', Period2019-03_M | trans_count', Period2019-04_M | trans_count', Period2019-05_M | trans_count', Period2019-06_M | trans_count', Period2019-07_M | trans_count', Period2019-08_M | trans_count', Period2019-09_M | trans_count', Period2019-10_M | trans_count', Period2019-11_M | trans_count', Period2019-12_M | trans_count', Period2020-01_M | trans_count', Period2020-02_M | trans_count', Period2020-03_M | trans_count', Period2020-04_M | trans_count', Period2020-05_M | trans_count', Period2020-06_M | trans_count', Period2020-07_M | trans_count', Period2020-08_M | trans_count', Period2020-09_M | trans_count', Period2020-10_M | trans_count', Period2020-11_M | trans_count', Period2020-12_M | trans_count', Period2021-01_M | trans_count', Period2021-02_M | trans_count', Period2021-03_M | trans_count', Period2021-04_M | trans_count', Period2021-05_M | trans_count', Period2021-06_M | trans_count', Period2021-07_M | trans_count', Period2021-08_M | trans_count', Period2021-09_M | trans_count', Period2021-10_M | trans_count', Period2021-11_M | trans_count', Period2021-12_M | trans_count', Period2022-01_M | trans_count', Period2022-02_M | trans_count', Period2022-03_M | trans_count', Period2022-04_M | trans_count', Period2022-05_M | trans_count', Period2022-06_M | trans_count', Period2022-07_M | trans_count', Period2022-08_M | trans_count', Period2022-09_M | trans_count', Period2022-10_M | trans_count', Period2022-11_M | trans_count', Period2022-12_M | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2348758451 | 96.05 | 3135.11 | 3079.02 | 3114.05 | 4193.61 | 4290.73 | 3657.02 | 3765.51 | 4004.26 | 2416.59 | 3962.87 | 3124.60 | 6008.22 | 3077.01 | 2998.80 | 3586.09 | 2436.95 | 3604.75 | 4707.65 | 2372.02 | 3111.07 | 6454.39 | 1679.22 | 2026.95 | 6503.73 | 1753.81 | 1483.12 | 4791.39 | 1432.27 | 2754.88 | 2991.87 | 3684.23 | 2806.12 | 2531.25 | 1559.79 | 2112.00 | 5953.41 | 5345.96 | 10263.91 | 14583.42 | 7094.84 | 7880.77 | 7649.20 | 9317.50 | 6850.65 | 6256.74 | 6588.37 | 8882.73 | 13408.74 | 1 | 40 | 49 | 57 | 60 | 75 | 67 | 70 | 68 | 51 | 57 | 45 | 102 | 44 | 53 | 56 | 44 | 54 | 71 | 56 | 66 | 45 | 51 | 50 | 118 | 46 | 34 | 50 | 47 | 50 | 71 | 76 | 62 | 54 | 42 | 51 | 119 | 97 | 104 | 150 | 136 | 144 | 160 | 190 | 159 | 135 | 152 | 127 | 285 |
| 1 | 2468061102 | 5.75 | 4793.59 | 2950.34 | 5128.18 | 5516.67 | 5324.35 | 7142.46 | 6812.52 | 6650.47 | 4145.92 | 5302.92 | 6493.92 | 11419.12 | 4553.61 | 3378.39 | 3979.67 | 5257.87 | 6132.76 | 6587.81 | 5275.80 | 5193.65 | 6320.92 | 4682.45 | 4766.77 | 9564.12 | 3048.51 | 2648.57 | 5633.02 | 7646.09 | 5595.21 | 4295.62 | 6773.62 | 4391.10 | 3689.29 | 4440.16 | 4611.52 | 10070.20 | 7917.40 | 5189.47 | 11352.79 | 9277.44 | 9837.36 | 13240.34 | 12826.66 | 11121.19 | 10107.50 | 11085.32 | 12425.87 | 18979.94 | 2 | 71 | 53 | 82 | 77 | 98 | 114 | 115 | 106 | 83 | 74 | 90 | 158 | 58 | 60 | 58 | 86 | 112 | 110 | 94 | 97 | 82 | 80 | 63 | 155 | 62 | 52 | 78 | 89 | 101 | 86 | 116 | 80 | 59 | 82 | 88 | 181 | 130 | 100 | 174 | 151 | 180 | 217 | 201 | 209 | 179 | 191 | 160 | 310 |
| 2 | 3005591724 | 9.47 | 1428.09 | 2065.59 | 2644.69 | 1240.21 | 2920.90 | 3629.99 | 2437.47 | 2174.59 | 706.39 | 1851.48 | 4132.25 | 3117.27 | 908.63 | 1059.99 | 2246.52 | 1612.55 | 1736.92 | 2107.86 | 1925.91 | 1558.20 | 1941.24 | 1335.45 | 1638.87 | 2912.18 | 2165.87 | 1343.19 | 1288.28 | 1336.99 | 2009.25 | 2490.44 | 2196.33 | 1910.26 | 970.84 | 5580.45 | 1424.08 | 3368.60 | 3404.76 | 4504.15 | 8251.73 | 6499.15 | 6581.49 | 6828.38 | 8442.24 | 8530.51 | 6109.92 | 6252.24 | 4902.91 | 11494.04 | 1 | 23 | 21 | 30 | 25 | 34 | 31 | 39 | 38 | 14 | 16 | 33 | 56 | 16 | 20 | 27 | 26 | 35 | 38 | 38 | 26 | 21 | 25 | 17 | 47 | 27 | 22 | 30 | 19 | 38 | 32 | 34 | 34 | 21 | 41 | 28 | 67 | 74 | 69 | 107 | 108 | 102 | 124 | 159 | 142 | 122 | 119 | 110 | 208 |
| 3 | 3418322859 | 0.00 | 5829.97 | 4646.60 | 6703.80 | 5745.56 | 6947.46 | 7834.35 | 8241.81 | 8215.51 | 6083.78 | 6130.85 | 4561.81 | 10911.14 | 5669.65 | 2633.59 | 3986.01 | 5264.24 | 5054.47 | 7329.56 | 5746.32 | 5304.16 | 7184.48 | 6439.11 | 3028.10 | 12515.35 | 5531.93 | 1924.89 | 5326.26 | 4138.59 | 4724.03 | 7412.62 | 7566.54 | 8705.06 | 4191.34 | 5032.46 | 3268.44 | 14907.34 | 4216.82 | 2783.59 | 5663.01 | 9328.76 | 7270.09 | 13990.58 | 7839.01 | 7342.55 | 6141.45 | 7249.93 | 5236.27 | 16095.44 | 0 | 97 | 80 | 114 | 103 | 120 | 133 | 140 | 144 | 105 | 107 | 112 | 212 | 83 | 66 | 90 | 115 | 142 | 130 | 144 | 134 | 123 | 107 | 87 | 208 | 93 | 66 | 113 | 107 | 123 | 132 | 146 | 145 | 111 | 136 | 92 | 229 | 87 | 86 | 141 | 159 | 150 | 162 | 182 | 144 | 127 | 136 | 149 | 286 |
| 4 | 4322238535 | 0.00 | 1369.44 | 1250.73 | 1634.28 | 2678.31 | 2658.26 | 2187.63 | 1445.66 | 4178.49 | 3196.41 | 1348.05 | 1565.03 | 5342.98 | 2539.69 | 918.26 | 1416.10 | 2397.57 | 1955.85 | 2892.05 | 1755.64 | 2013.18 | 1357.14 | 1609.49 | 1648.21 | 5087.10 | 737.62 | 1031.34 | 1175.36 | 981.97 | 1497.06 | 2837.26 | 2479.07 | 1076.47 | 1820.65 | 1609.43 | 2051.37 | 2576.63 | 1672.36 | 2434.82 | 3356.19 | 3257.14 | 4866.14 | 4084.15 | 3371.14 | 3609.39 | 4716.26 | 3044.20 | 3551.07 | 7364.31 | 0 | 20 | 19 | 23 | 44 | 28 | 37 | 29 | 36 | 25 | 18 | 30 | 53 | 23 | 14 | 33 | 34 | 32 | 35 | 30 | 27 | 25 | 30 | 28 | 44 | 18 | 20 | 17 | 23 | 28 | 37 | 34 | 32 | 29 | 34 | 26 | 57 | 31 | 37 | 54 | 53 | 65 | 64 | 73 | 66 | 72 | 55 | 56 | 120 |
customer_summary_flat = customer_summary_flat.rename(columns={"total_amt', Period2022-12_M": "Target"})
fig, axes = plt.subplots(8, 6, figsize=(25, 25))
axes = [ax for axes_row in axes for ax in axes_row]
for i, c in enumerate(customer_summary_flat.loc[:, "total_amt', Period2018-12_M":"total_amt', Period2022-11_M"]):
sns.scatterplot(y="Target", x=c, data=customer_summary_flat, ax=axes[i], color='orange')
sns.regplot(y="Target", x=c, data=customer_summary_flat, ax=axes[i], scatter=False, color='red')
corr = customer_summary_flat['Target'].corr(customer_summary_flat[c])
axes[i].set_title('Corr: {:.2f}'.format(corr), fontsize=12)
plt.tight_layout()
plt.show()
fig, axes = plt.subplots(8, 6, figsize=(25, 25))
axes = [ax for axes_row in axes for ax in axes_row]
for i, c in enumerate(customer_summary_flat.loc[:, "total_amt', Period2018-12_M":"total_amt', Period2022-11_M"]):
sns.boxplot(x=c, data=customer_summary_flat, ax=axes[i], color='orange', flierprops=dict(markerfacecolor='red'))
plt.tight_layout()
plt.show()
fig, axes = plt.subplots(7, 7, figsize=(25, 25))
axes = [ax for axes_row in axes for ax in axes_row]
for i, c in enumerate(customer_summary_flat.loc[:, "trans_count', Period2018-12_M":"trans_count', Period2022-12_M"]):
sns.scatterplot(y="Target", x=c, data=customer_summary_flat, ax=axes[i], color='orange')
sns.regplot(y="Target", x=c, data=customer_summary_flat, ax=axes[i], scatter=False, color='red')
corr = customer_summary_flat['Target'].corr(customer_summary_flat[c])
axes[i].set_title('Corr: {:.2f}'.format(corr), fontsize=12)
plt.tight_layout()
plt.show()
fig, axes = plt.subplots(7, 7, figsize=(25, 25))
axes = [ax for axes_row in axes for ax in axes_row]
for i, c in enumerate(customer_summary_flat.loc[:, "trans_count', Period2018-12_M":"trans_count', Period2022-12_M"]):
sns.boxplot(x=c, data=customer_summary_flat, ax=axes[i], color='orange', flierprops=dict(markerfacecolor='red'))
plt.tight_layout()
plt.show()
customer_demographics = pred_trans_df[['acct_num', 'gender', 'age', 'job', 'city']]
trans_df = pd.merge(customer_summary_flat, customer_demographics, on='acct_num')
trans_df.drop_duplicates(inplace=True)
trans_df.reset_index(drop=True, inplace=True)
bins = [0, 18, 24, 34, 44, 54, 64, 200]
labels = ['<18', '18-24', '25-34', '35-44', '45-54', '55-64', '65+']
trans_df['age_group'] = pd.cut(trans_df['age'], bins=bins, labels=labels)
target_index = trans_df.columns.get_loc("Target")
cols = list(trans_df.columns)
cols.append(cols.pop(target_index))
trans_df = trans_df[cols]
trans_df.head()
| acct_num | total_amt', Period2018-12_M | total_amt', Period2019-01_M | total_amt', Period2019-02_M | total_amt', Period2019-03_M | total_amt', Period2019-04_M | total_amt', Period2019-05_M | total_amt', Period2019-06_M | total_amt', Period2019-07_M | total_amt', Period2019-08_M | total_amt', Period2019-09_M | total_amt', Period2019-10_M | total_amt', Period2019-11_M | total_amt', Period2019-12_M | total_amt', Period2020-01_M | total_amt', Period2020-02_M | total_amt', Period2020-03_M | total_amt', Period2020-04_M | total_amt', Period2020-05_M | total_amt', Period2020-06_M | total_amt', Period2020-07_M | total_amt', Period2020-08_M | total_amt', Period2020-09_M | total_amt', Period2020-10_M | total_amt', Period2020-11_M | total_amt', Period2020-12_M | total_amt', Period2021-01_M | total_amt', Period2021-02_M | total_amt', Period2021-03_M | total_amt', Period2021-04_M | total_amt', Period2021-05_M | total_amt', Period2021-06_M | total_amt', Period2021-07_M | total_amt', Period2021-08_M | total_amt', Period2021-09_M | total_amt', Period2021-10_M | total_amt', Period2021-11_M | total_amt', Period2021-12_M | total_amt', Period2022-01_M | total_amt', Period2022-02_M | total_amt', Period2022-03_M | total_amt', Period2022-04_M | total_amt', Period2022-05_M | total_amt', Period2022-06_M | total_amt', Period2022-07_M | total_amt', Period2022-08_M | total_amt', Period2022-09_M | total_amt', Period2022-10_M | total_amt', Period2022-11_M | trans_count', Period2018-12_M | trans_count', Period2019-01_M | trans_count', Period2019-02_M | trans_count', Period2019-03_M | trans_count', Period2019-04_M | trans_count', Period2019-05_M | trans_count', Period2019-06_M | trans_count', Period2019-07_M | trans_count', Period2019-08_M | trans_count', Period2019-09_M | trans_count', Period2019-10_M | trans_count', Period2019-11_M | trans_count', Period2019-12_M | trans_count', Period2020-01_M | trans_count', Period2020-02_M | trans_count', Period2020-03_M | trans_count', Period2020-04_M | trans_count', Period2020-05_M | trans_count', Period2020-06_M | trans_count', Period2020-07_M | trans_count', Period2020-08_M | trans_count', Period2020-09_M | trans_count', Period2020-10_M | trans_count', Period2020-11_M | trans_count', Period2020-12_M | trans_count', Period2021-01_M | trans_count', Period2021-02_M | trans_count', Period2021-03_M | trans_count', Period2021-04_M | trans_count', Period2021-05_M | trans_count', Period2021-06_M | trans_count', Period2021-07_M | trans_count', Period2021-08_M | trans_count', Period2021-09_M | trans_count', Period2021-10_M | trans_count', Period2021-11_M | trans_count', Period2021-12_M | trans_count', Period2022-01_M | trans_count', Period2022-02_M | trans_count', Period2022-03_M | trans_count', Period2022-04_M | trans_count', Period2022-05_M | trans_count', Period2022-06_M | trans_count', Period2022-07_M | trans_count', Period2022-08_M | trans_count', Period2022-09_M | trans_count', Period2022-10_M | trans_count', Period2022-11_M | trans_count', Period2022-12_M | gender | age | job | city | age_group | Target | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2348758451 | 96.05 | 3135.11 | 3079.02 | 3114.05 | 4193.61 | 4290.73 | 3657.02 | 3765.51 | 4004.26 | 2416.59 | 3962.87 | 3124.60 | 6008.22 | 3077.01 | 2998.80 | 3586.09 | 2436.95 | 3604.75 | 4707.65 | 2372.02 | 3111.07 | 6454.39 | 1679.22 | 2026.95 | 6503.73 | 1753.81 | 1483.12 | 4791.39 | 1432.27 | 2754.88 | 2991.87 | 3684.23 | 2806.12 | 2531.25 | 1559.79 | 2112.00 | 5953.41 | 5345.96 | 10263.91 | 14583.42 | 7094.84 | 7880.77 | 7649.20 | 9317.50 | 6850.65 | 6256.74 | 6588.37 | 8882.73 | 1 | 40 | 49 | 57 | 60 | 75 | 67 | 70 | 68 | 51 | 57 | 45 | 102 | 44 | 53 | 56 | 44 | 54 | 71 | 56 | 66 | 45 | 51 | 50 | 118 | 46 | 34 | 50 | 47 | 50 | 71 | 76 | 62 | 54 | 42 | 51 | 119 | 97 | 104 | 150 | 136 | 144 | 160 | 190 | 159 | 135 | 152 | 127 | 285 | M | 42 | Surveyor, minerals | Rochester | 35-44 | 13408.74 |
| 1 | 2468061102 | 5.75 | 4793.59 | 2950.34 | 5128.18 | 5516.67 | 5324.35 | 7142.46 | 6812.52 | 6650.47 | 4145.92 | 5302.92 | 6493.92 | 11419.12 | 4553.61 | 3378.39 | 3979.67 | 5257.87 | 6132.76 | 6587.81 | 5275.80 | 5193.65 | 6320.92 | 4682.45 | 4766.77 | 9564.12 | 3048.51 | 2648.57 | 5633.02 | 7646.09 | 5595.21 | 4295.62 | 6773.62 | 4391.10 | 3689.29 | 4440.16 | 4611.52 | 10070.20 | 7917.40 | 5189.47 | 11352.79 | 9277.44 | 9837.36 | 13240.34 | 12826.66 | 11121.19 | 10107.50 | 11085.32 | 12425.87 | 2 | 71 | 53 | 82 | 77 | 98 | 114 | 115 | 106 | 83 | 74 | 90 | 158 | 58 | 60 | 58 | 86 | 112 | 110 | 94 | 97 | 82 | 80 | 63 | 155 | 62 | 52 | 78 | 89 | 101 | 86 | 116 | 80 | 59 | 82 | 88 | 181 | 130 | 100 | 174 | 151 | 180 | 217 | 201 | 209 | 179 | 191 | 160 | 310 | F | 60 | Nurse, adult | Oceanside | 55-64 | 18979.94 |
| 2 | 3005591724 | 9.47 | 1428.09 | 2065.59 | 2644.69 | 1240.21 | 2920.90 | 3629.99 | 2437.47 | 2174.59 | 706.39 | 1851.48 | 4132.25 | 3117.27 | 908.63 | 1059.99 | 2246.52 | 1612.55 | 1736.92 | 2107.86 | 1925.91 | 1558.20 | 1941.24 | 1335.45 | 1638.87 | 2912.18 | 2165.87 | 1343.19 | 1288.28 | 1336.99 | 2009.25 | 2490.44 | 2196.33 | 1910.26 | 970.84 | 5580.45 | 1424.08 | 3368.60 | 3404.76 | 4504.15 | 8251.73 | 6499.15 | 6581.49 | 6828.38 | 8442.24 | 8530.51 | 6109.92 | 6252.24 | 4902.91 | 1 | 23 | 21 | 30 | 25 | 34 | 31 | 39 | 38 | 14 | 16 | 33 | 56 | 16 | 20 | 27 | 26 | 35 | 38 | 38 | 26 | 21 | 25 | 17 | 47 | 27 | 22 | 30 | 19 | 38 | 32 | 34 | 34 | 21 | 41 | 28 | 67 | 74 | 69 | 107 | 108 | 102 | 124 | 159 | 142 | 122 | 119 | 110 | 208 | F | 73 | Engineer, automotive | Lancaster | 65+ | 11494.04 |
| 3 | 3418322859 | 0.00 | 5829.97 | 4646.60 | 6703.80 | 5745.56 | 6947.46 | 7834.35 | 8241.81 | 8215.51 | 6083.78 | 6130.85 | 4561.81 | 10911.14 | 5669.65 | 2633.59 | 3986.01 | 5264.24 | 5054.47 | 7329.56 | 5746.32 | 5304.16 | 7184.48 | 6439.11 | 3028.10 | 12515.35 | 5531.93 | 1924.89 | 5326.26 | 4138.59 | 4724.03 | 7412.62 | 7566.54 | 8705.06 | 4191.34 | 5032.46 | 3268.44 | 14907.34 | 4216.82 | 2783.59 | 5663.01 | 9328.76 | 7270.09 | 13990.58 | 7839.01 | 7342.55 | 6141.45 | 7249.93 | 5236.27 | 0 | 97 | 80 | 114 | 103 | 120 | 133 | 140 | 144 | 105 | 107 | 112 | 212 | 83 | 66 | 90 | 115 | 142 | 130 | 144 | 134 | 123 | 107 | 87 | 208 | 93 | 66 | 113 | 107 | 123 | 132 | 146 | 145 | 111 | 136 | 92 | 229 | 87 | 86 | 141 | 159 | 150 | 162 | 182 | 144 | 127 | 136 | 149 | 286 | F | 17 | Operational investment banker | Mountain View | <18 | 16095.44 |
| 4 | 4322238535 | 0.00 | 1369.44 | 1250.73 | 1634.28 | 2678.31 | 2658.26 | 2187.63 | 1445.66 | 4178.49 | 3196.41 | 1348.05 | 1565.03 | 5342.98 | 2539.69 | 918.26 | 1416.10 | 2397.57 | 1955.85 | 2892.05 | 1755.64 | 2013.18 | 1357.14 | 1609.49 | 1648.21 | 5087.10 | 737.62 | 1031.34 | 1175.36 | 981.97 | 1497.06 | 2837.26 | 2479.07 | 1076.47 | 1820.65 | 1609.43 | 2051.37 | 2576.63 | 1672.36 | 2434.82 | 3356.19 | 3257.14 | 4866.14 | 4084.15 | 3371.14 | 3609.39 | 4716.26 | 3044.20 | 3551.07 | 0 | 20 | 19 | 23 | 44 | 28 | 37 | 29 | 36 | 25 | 18 | 30 | 53 | 23 | 14 | 33 | 34 | 32 | 35 | 30 | 27 | 25 | 30 | 28 | 44 | 18 | 20 | 17 | 23 | 28 | 37 | 34 | 32 | 29 | 34 | 26 | 57 | 31 | 37 | 54 | 53 | 65 | 64 | 73 | 66 | 72 | 55 | 56 | 120 | M | 88 | Catering manager | Honolulu | 65+ | 7364.31 |
sns.boxplot(x='age_group', y='Target', data=trans_df, palette='Set3')
plt.title('Dec 2022 Spending by Age Group')
plt.xticks(rotation=45)
plt.show()
sns.boxplot(x='gender', y='Target', data=trans_df, palette='Set3')
plt.title('Dec 2022 Spending by gender')
plt.xticks(rotation=45)
plt.show()
dec2022 = total_spent_per_month[total_spent_per_month['trans_month_year'] == '2022-12']
dec2022.head()
| acct_num | trans_month_year | category | age | gender | job | city | state | zip | trans_count | mean_amt | max_amt | min_amt | total_amt | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 48 | 2348758451 | 2022-12 | travel | 42 | M | Surveyor, minerals | Rochester | NY | 14621 | 285 | 47.048211 | 1562.15 | 1.10 | 13408.74 |
| 97 | 2468061102 | 2022-12 | travel | 60 | F | Nurse, adult | Oceanside | CA | 92057 | 310 | 61.225613 | 1748.31 | 1.02 | 18979.94 |
| 146 | 3005591724 | 2022-12 | travel | 73 | F | Engineer, automotive | Lancaster | PA | 17601 | 208 | 55.259808 | 384.94 | 1.05 | 11494.04 |
| 194 | 3418322859 | 2022-12 | travel | 17 | F | Operational investment banker | Mountain View | CA | 94040 | 286 | 56.277762 | 1928.42 | 1.01 | 16095.44 |
| 242 | 4322238535 | 2022-12 | travel | 88 | M | Catering manager | Honolulu | HI | 96816 | 120 | 61.369250 | 245.56 | 1.10 | 7364.31 |
sns.boxplot(x='category', y='total_amt', data=dec2022, palette='Set3')
plt.title('Dec 2022 Spending by Category')
plt.xticks(rotation=45)
plt.show()
dec_2022 = pred_trans_df[pred_trans_df['trans_month_year'] == '2022-12']
pred_trans_df.head()
| ssn | cc_num | first | last | gender | street | city | state | zip | lat | long | city_pop | job | dob | acct_num | trans_num | unix_time | category | amt | is_fraud | merchant | merch_lat | merch_long | trans_date_time | trans_month_year | quarter | first_trans_month_year | age | month | year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 91ab12e73ef38206e1121e9648d2408d | 1558719550 | gas_transport | 69.12 | 0 | Phillips Group | 39.491416 | -75.588522 | 2019-05-24 17:39:10 | 2019-05 | 2019Q2 | 2018-12-01 | 59 | 5 | 2019 |
| 1 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 071553d533a6822a4431c354c434ddcb | 1569425519 | grocery_pos | 68.11 | 0 | Tucker Ltd | 40.890319 | -75.573359 | 2019-09-25 15:31:59 | 2019-09 | 2019Q3 | 2018-12-01 | 59 | 9 | 2019 |
| 2 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 0cfad38ef15e4749eff68dc83f62c151 | 1577205601 | misc_net | 40.35 | 0 | Dixon PLC | 39.244958 | -74.475327 | 2019-12-24 16:40:01 | 2019-12 | 2019Q4 | 2018-12-01 | 59 | 12 | 2019 |
| 3 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 5782693d7c70f062f258cb30bfa8900f | 1571428238 | grocery_pos | 96.22 | 0 | Lambert-Cooper | 39.656925 | -75.802342 | 2019-10-18 19:50:38 | 2019-10 | 2019Q4 | 2018-12-01 | 59 | 10 | 2019 |
| 4 | 115-04-4507 | 4218196001337 | Jonathan | Johnson | M | 863 Lawrence Valleys | Ambler | PA | 19002 | 40.1809 | -75.2156 | 32412 | Accounting technician | 1959-10-03 | 888022315787 | 35fd7db657d7e30dd608c37f7798186e | 1549840400 | gas_transport | 71.89 | 0 | Griffith LLC | 40.313342 | -74.220434 | 2019-02-10 23:13:20 | 2019-02 | 2019Q1 | 2018-12-01 | 59 | 2 | 2019 |
trans_df.shape
(972, 104)
correlation = trans_df.corr()
plt.figure(figsize=(50, 50))
sns.heatmap(correlation, annot=True, cmap='YlGnBu')
plt.show()
/var/folders/lk/qbgh0syd1l7_7h6ghvq233nr0000gn/T/ipykernel_92512/3535905867.py:1: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning. correlation = trans_df.corr()
correlation
| total_amt', Period2018-12_M | total_amt', Period2019-01_M | total_amt', Period2019-02_M | total_amt', Period2019-03_M | total_amt', Period2019-04_M | total_amt', Period2019-05_M | total_amt', Period2019-06_M | total_amt', Period2019-07_M | total_amt', Period2019-08_M | total_amt', Period2019-09_M | total_amt', Period2019-10_M | total_amt', Period2019-11_M | total_amt', Period2019-12_M | total_amt', Period2020-01_M | total_amt', Period2020-02_M | total_amt', Period2020-03_M | total_amt', Period2020-04_M | total_amt', Period2020-05_M | total_amt', Period2020-06_M | total_amt', Period2020-07_M | total_amt', Period2020-08_M | total_amt', Period2020-09_M | total_amt', Period2020-10_M | total_amt', Period2020-11_M | total_amt', Period2020-12_M | total_amt', Period2021-01_M | total_amt', Period2021-02_M | total_amt', Period2021-03_M | total_amt', Period2021-04_M | total_amt', Period2021-05_M | total_amt', Period2021-06_M | total_amt', Period2021-07_M | total_amt', Period2021-08_M | total_amt', Period2021-09_M | total_amt', Period2021-10_M | total_amt', Period2021-11_M | total_amt', Period2021-12_M | total_amt', Period2022-01_M | total_amt', Period2022-02_M | total_amt', Period2022-03_M | total_amt', Period2022-04_M | total_amt', Period2022-05_M | total_amt', Period2022-06_M | total_amt', Period2022-07_M | total_amt', Period2022-08_M | total_amt', Period2022-09_M | total_amt', Period2022-10_M | total_amt', Period2022-11_M | trans_count', Period2018-12_M | trans_count', Period2019-01_M | trans_count', Period2019-02_M | trans_count', Period2019-03_M | trans_count', Period2019-04_M | trans_count', Period2019-05_M | trans_count', Period2019-06_M | trans_count', Period2019-07_M | trans_count', Period2019-08_M | trans_count', Period2019-09_M | trans_count', Period2019-10_M | trans_count', Period2019-11_M | trans_count', Period2019-12_M | trans_count', Period2020-01_M | trans_count', Period2020-02_M | trans_count', Period2020-03_M | trans_count', Period2020-04_M | trans_count', Period2020-05_M | trans_count', Period2020-06_M | trans_count', Period2020-07_M | trans_count', Period2020-08_M | trans_count', Period2020-09_M | trans_count', Period2020-10_M | trans_count', Period2020-11_M | trans_count', Period2020-12_M | trans_count', Period2021-01_M | trans_count', Period2021-02_M | trans_count', Period2021-03_M | trans_count', Period2021-04_M | trans_count', Period2021-05_M | trans_count', Period2021-06_M | trans_count', Period2021-07_M | trans_count', Period2021-08_M | trans_count', Period2021-09_M | trans_count', Period2021-10_M | trans_count', Period2021-11_M | trans_count', Period2021-12_M | trans_count', Period2022-01_M | trans_count', Period2022-02_M | trans_count', Period2022-03_M | trans_count', Period2022-04_M | trans_count', Period2022-05_M | trans_count', Period2022-06_M | trans_count', Period2022-07_M | trans_count', Period2022-08_M | trans_count', Period2022-09_M | trans_count', Period2022-10_M | trans_count', Period2022-11_M | trans_count', Period2022-12_M | age | Target | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| total_amt', Period2018-12_M | 1.000000 | 0.532484 | 0.522456 | 0.511195 | 0.530995 | 0.517475 | 0.529913 | 0.530591 | 0.521404 | 0.539699 | 0.535104 | 0.520855 | 0.523571 | 0.515719 | 0.529609 | 0.513539 | 0.485701 | 0.446822 | 0.445354 | 0.501977 | 0.477245 | 0.486197 | 0.466752 | 0.458473 | 0.505537 | 0.475332 | 0.415254 | 0.482738 | 0.477579 | 0.460709 | 0.461116 | 0.478910 | 0.475206 | 0.483292 | 0.476297 | 0.446306 | 0.471941 | 0.425631 | 0.423659 | 0.413413 | 0.420878 | 0.433582 | 0.408367 | 0.437908 | 0.399066 | 0.419300 | 0.393458 | 0.405663 | 0.758131 | 0.452343 | 0.462406 | 0.458912 | 0.468279 | 0.454956 | 0.462828 | 0.464256 | 0.456326 | 0.470820 | 0.459396 | 0.458764 | 0.461434 | 0.470286 | 0.475649 | 0.457420 | 0.453956 | 0.449220 | 0.464791 | 0.470284 | 0.468283 | 0.471719 | 0.454737 | 0.459383 | 0.472859 | 0.462823 | 0.441932 | 0.469830 | 0.460774 | 0.456129 | 0.464774 | 0.463455 | 0.463294 | 0.458722 | 0.466350 | 0.446187 | 0.463210 | 0.391451 | 0.396722 | 0.382438 | 0.384457 | 0.380524 | 0.390909 | 0.401394 | 0.383796 | 0.388190 | 0.382005 | 0.384506 | 0.386503 | -0.183845 | 0.434274 |
| total_amt', Period2019-01_M | 0.532484 | 1.000000 | 0.916214 | 0.937136 | 0.938201 | 0.937005 | 0.944725 | 0.945612 | 0.947833 | 0.937074 | 0.938838 | 0.933687 | 0.949845 | 0.927369 | 0.914979 | 0.935022 | 0.856693 | 0.842186 | 0.808883 | 0.849753 | 0.835867 | 0.824464 | 0.821580 | 0.800321 | 0.875595 | 0.800278 | 0.773230 | 0.837261 | 0.801300 | 0.808359 | 0.837400 | 0.847710 | 0.778866 | 0.797004 | 0.834971 | 0.789355 | 0.866196 | 0.738596 | 0.723252 | 0.710342 | 0.727539 | 0.730376 | 0.716978 | 0.706839 | 0.712005 | 0.723632 | 0.685961 | 0.683284 | 0.469361 | 0.867305 | 0.827236 | 0.834492 | 0.837639 | 0.832431 | 0.842734 | 0.842286 | 0.843911 | 0.836527 | 0.843813 | 0.847733 | 0.845453 | 0.835428 | 0.840678 | 0.838266 | 0.831728 | 0.832231 | 0.843920 | 0.840544 | 0.832978 | 0.843401 | 0.831950 | 0.838680 | 0.850894 | 0.826641 | 0.825560 | 0.838643 | 0.828041 | 0.834356 | 0.837536 | 0.833660 | 0.835547 | 0.837525 | 0.839388 | 0.830636 | 0.838665 | 0.711435 | 0.688998 | 0.678354 | 0.681420 | 0.693669 | 0.698531 | 0.689756 | 0.689761 | 0.684164 | 0.692214 | 0.681917 | 0.691093 | -0.314821 | 0.756122 |
| total_amt', Period2019-02_M | 0.522456 | 0.916214 | 1.000000 | 0.928625 | 0.933851 | 0.932109 | 0.938462 | 0.935318 | 0.933530 | 0.935102 | 0.935894 | 0.921929 | 0.936889 | 0.929482 | 0.906812 | 0.919464 | 0.854616 | 0.836868 | 0.801239 | 0.845848 | 0.824963 | 0.813410 | 0.814041 | 0.794866 | 0.864707 | 0.790995 | 0.767196 | 0.836871 | 0.796464 | 0.800576 | 0.832140 | 0.839360 | 0.790242 | 0.791644 | 0.824704 | 0.786317 | 0.869827 | 0.720941 | 0.714472 | 0.697890 | 0.707402 | 0.716387 | 0.700699 | 0.696530 | 0.703610 | 0.710236 | 0.670724 | 0.671849 | 0.462963 | 0.821353 | 0.870161 | 0.830100 | 0.834151 | 0.828582 | 0.838449 | 0.836692 | 0.833723 | 0.835141 | 0.837570 | 0.837294 | 0.837470 | 0.841366 | 0.830363 | 0.831369 | 0.826092 | 0.831807 | 0.837182 | 0.835129 | 0.827187 | 0.833601 | 0.825370 | 0.834284 | 0.843543 | 0.821560 | 0.816118 | 0.829301 | 0.819529 | 0.829209 | 0.833878 | 0.829774 | 0.835839 | 0.831459 | 0.833806 | 0.827235 | 0.834301 | 0.692535 | 0.673323 | 0.661588 | 0.663667 | 0.677460 | 0.675784 | 0.674317 | 0.670360 | 0.668236 | 0.671368 | 0.666378 | 0.672320 | -0.308327 | 0.736216 |
| total_amt', Period2019-03_M | 0.511195 | 0.937136 | 0.928625 | 1.000000 | 0.944329 | 0.946561 | 0.953934 | 0.955878 | 0.952494 | 0.944929 | 0.949748 | 0.942166 | 0.952763 | 0.929830 | 0.914640 | 0.936724 | 0.866597 | 0.849458 | 0.812309 | 0.860593 | 0.849638 | 0.835887 | 0.828208 | 0.827572 | 0.877416 | 0.800524 | 0.781997 | 0.842405 | 0.809728 | 0.821268 | 0.847173 | 0.841998 | 0.788268 | 0.807852 | 0.843001 | 0.787829 | 0.878504 | 0.744890 | 0.729361 | 0.718262 | 0.728907 | 0.732597 | 0.720922 | 0.711025 | 0.717676 | 0.725661 | 0.692241 | 0.684701 | 0.468197 | 0.835327 | 0.837044 | 0.870714 | 0.844074 | 0.842022 | 0.852381 | 0.850870 | 0.849431 | 0.848160 | 0.852769 | 0.854246 | 0.852151 | 0.848167 | 0.844247 | 0.847701 | 0.845664 | 0.844811 | 0.850482 | 0.848804 | 0.844382 | 0.853677 | 0.837464 | 0.849299 | 0.856097 | 0.829909 | 0.829020 | 0.845552 | 0.834987 | 0.847176 | 0.846073 | 0.839410 | 0.845161 | 0.849971 | 0.845954 | 0.834370 | 0.849719 | 0.712484 | 0.689472 | 0.680758 | 0.683252 | 0.694329 | 0.697214 | 0.688379 | 0.690341 | 0.682826 | 0.691742 | 0.687030 | 0.691127 | -0.317154 | 0.754889 |
| total_amt', Period2019-04_M | 0.530995 | 0.938201 | 0.933851 | 0.944329 | 1.000000 | 0.951047 | 0.951196 | 0.955496 | 0.950751 | 0.947074 | 0.947989 | 0.943317 | 0.954231 | 0.935805 | 0.913551 | 0.941970 | 0.870734 | 0.851876 | 0.816102 | 0.860193 | 0.849607 | 0.833377 | 0.834691 | 0.821159 | 0.879438 | 0.801753 | 0.783943 | 0.841836 | 0.817693 | 0.820705 | 0.845892 | 0.850282 | 0.785117 | 0.794618 | 0.845026 | 0.782626 | 0.876798 | 0.754006 | 0.736801 | 0.723432 | 0.741517 | 0.742116 | 0.720047 | 0.721130 | 0.720828 | 0.729735 | 0.703767 | 0.687573 | 0.466034 | 0.835619 | 0.840877 | 0.843371 | 0.874956 | 0.845523 | 0.852257 | 0.850693 | 0.849340 | 0.846332 | 0.852477 | 0.855461 | 0.852981 | 0.850313 | 0.844559 | 0.850761 | 0.842492 | 0.843618 | 0.851377 | 0.849786 | 0.844117 | 0.852200 | 0.839617 | 0.848311 | 0.857726 | 0.836577 | 0.830156 | 0.841997 | 0.836232 | 0.846324 | 0.846305 | 0.841950 | 0.845408 | 0.845724 | 0.847967 | 0.835301 | 0.846067 | 0.720980 | 0.698612 | 0.691083 | 0.691615 | 0.703747 | 0.701992 | 0.698905 | 0.698208 | 0.690579 | 0.701867 | 0.694853 | 0.699185 | -0.318468 | 0.760828 |
| total_amt', Period2019-05_M | 0.517475 | 0.937005 | 0.932109 | 0.946561 | 0.951047 | 1.000000 | 0.953899 | 0.953930 | 0.955459 | 0.946753 | 0.951378 | 0.944847 | 0.957814 | 0.939190 | 0.920633 | 0.939572 | 0.865240 | 0.845355 | 0.823189 | 0.857898 | 0.849753 | 0.841499 | 0.837901 | 0.821685 | 0.881403 | 0.800141 | 0.780155 | 0.840020 | 0.811467 | 0.820852 | 0.845813 | 0.855812 | 0.792227 | 0.808272 | 0.844530 | 0.797268 | 0.880724 | 0.747312 | 0.735611 | 0.716229 | 0.738930 | 0.736502 | 0.731702 | 0.715696 | 0.720976 | 0.726998 | 0.699281 | 0.687091 | 0.465663 | 0.841041 | 0.842734 | 0.847101 | 0.849373 | 0.873533 | 0.854790 | 0.853106 | 0.854924 | 0.850055 | 0.856715 | 0.857707 | 0.858691 | 0.853196 | 0.848352 | 0.849106 | 0.845087 | 0.846251 | 0.854238 | 0.853096 | 0.846604 | 0.858116 | 0.844983 | 0.852801 | 0.861938 | 0.835303 | 0.836149 | 0.846114 | 0.837999 | 0.852313 | 0.847810 | 0.846822 | 0.849642 | 0.849384 | 0.850287 | 0.842099 | 0.852541 | 0.720615 | 0.699520 | 0.687619 | 0.693059 | 0.703482 | 0.705345 | 0.700335 | 0.699899 | 0.690638 | 0.701191 | 0.692597 | 0.699547 | -0.320345 | 0.762687 |
| total_amt', Period2019-06_M | 0.529913 | 0.944725 | 0.938462 | 0.953934 | 0.951196 | 0.953899 | 1.000000 | 0.962359 | 0.960953 | 0.956463 | 0.953610 | 0.951268 | 0.963040 | 0.942800 | 0.930568 | 0.941453 | 0.876298 | 0.853907 | 0.817987 | 0.863368 | 0.848563 | 0.836587 | 0.833242 | 0.824595 | 0.879509 | 0.804130 | 0.782277 | 0.846208 | 0.810119 | 0.823106 | 0.847522 | 0.850500 | 0.794651 | 0.814241 | 0.843420 | 0.791439 | 0.886241 | 0.749527 | 0.737941 | 0.718949 | 0.739820 | 0.737101 | 0.720356 | 0.714604 | 0.719936 | 0.725851 | 0.694761 | 0.691536 | 0.468306 | 0.840378 | 0.838535 | 0.846845 | 0.845405 | 0.844163 | 0.874446 | 0.852801 | 0.851066 | 0.848782 | 0.851791 | 0.857041 | 0.854831 | 0.847633 | 0.848464 | 0.846618 | 0.842380 | 0.845728 | 0.856488 | 0.851485 | 0.844402 | 0.851387 | 0.841978 | 0.850541 | 0.858753 | 0.833486 | 0.832801 | 0.846650 | 0.834347 | 0.845410 | 0.846228 | 0.842755 | 0.848757 | 0.846544 | 0.850755 | 0.837605 | 0.848754 | 0.714582 | 0.693278 | 0.683926 | 0.686240 | 0.697467 | 0.698274 | 0.694527 | 0.693276 | 0.684839 | 0.695884 | 0.688481 | 0.693433 | -0.327645 | 0.758815 |
| total_amt', Period2019-07_M | 0.530591 | 0.945612 | 0.935318 | 0.955878 | 0.955496 | 0.953930 | 0.962359 | 1.000000 | 0.960179 | 0.953639 | 0.955936 | 0.951293 | 0.963679 | 0.939604 | 0.928270 | 0.948570 | 0.872090 | 0.856496 | 0.817224 | 0.862849 | 0.851660 | 0.843389 | 0.842050 | 0.825836 | 0.882305 | 0.805844 | 0.788315 | 0.846252 | 0.817480 | 0.818173 | 0.849906 | 0.858493 | 0.787473 | 0.814664 | 0.851908 | 0.796234 | 0.884223 | 0.752446 | 0.739745 | 0.725418 | 0.745081 | 0.744286 | 0.723619 | 0.726857 | 0.726148 | 0.730342 | 0.707365 | 0.697302 | 0.478752 | 0.841287 | 0.841762 | 0.848015 | 0.847243 | 0.845775 | 0.857282 | 0.876209 | 0.850850 | 0.850547 | 0.853757 | 0.859127 | 0.857003 | 0.851245 | 0.848025 | 0.852657 | 0.844680 | 0.847354 | 0.857067 | 0.853267 | 0.845227 | 0.856910 | 0.842750 | 0.854140 | 0.860508 | 0.833541 | 0.834782 | 0.846012 | 0.837847 | 0.850273 | 0.848157 | 0.844969 | 0.848376 | 0.849497 | 0.852688 | 0.838165 | 0.852593 | 0.718733 | 0.700187 | 0.690842 | 0.692114 | 0.703907 | 0.705526 | 0.702023 | 0.700026 | 0.693268 | 0.703741 | 0.692095 | 0.700103 | -0.330162 | 0.767414 |
| total_amt', Period2019-08_M | 0.521404 | 0.947833 | 0.933530 | 0.952494 | 0.950751 | 0.955459 | 0.960953 | 0.960179 | 1.000000 | 0.954425 | 0.955215 | 0.949453 | 0.961861 | 0.943884 | 0.927275 | 0.943664 | 0.869102 | 0.850315 | 0.823379 | 0.862467 | 0.846218 | 0.836732 | 0.831373 | 0.821638 | 0.884477 | 0.808699 | 0.792435 | 0.846651 | 0.820024 | 0.819695 | 0.852928 | 0.863354 | 0.789950 | 0.810299 | 0.851879 | 0.793491 | 0.891064 | 0.748756 | 0.734571 | 0.710101 | 0.731262 | 0.734056 | 0.716172 | 0.715725 | 0.715935 | 0.725794 | 0.694738 | 0.690413 | 0.467535 | 0.843100 | 0.840316 | 0.848133 | 0.846596 | 0.845386 | 0.857034 | 0.853536 | 0.872691 | 0.852616 | 0.852823 | 0.858899 | 0.856632 | 0.854692 | 0.850126 | 0.850201 | 0.845209 | 0.845661 | 0.856239 | 0.853515 | 0.845346 | 0.856603 | 0.841874 | 0.855877 | 0.860508 | 0.834926 | 0.836035 | 0.847280 | 0.837990 | 0.849375 | 0.849608 | 0.845519 | 0.849981 | 0.851840 | 0.851657 | 0.838768 | 0.852516 | 0.715572 | 0.692499 | 0.680239 | 0.683681 | 0.696728 | 0.694803 | 0.692115 | 0.691264 | 0.685406 | 0.693420 | 0.685243 | 0.691406 | -0.318723 | 0.757570 |
| total_amt', Period2019-09_M | 0.539699 | 0.937074 | 0.935102 | 0.944929 | 0.947074 | 0.946753 | 0.956463 | 0.953639 | 0.954425 | 1.000000 | 0.953389 | 0.943442 | 0.954467 | 0.932251 | 0.919621 | 0.935227 | 0.866395 | 0.844398 | 0.817040 | 0.857196 | 0.839314 | 0.841009 | 0.837542 | 0.819573 | 0.875705 | 0.809148 | 0.775294 | 0.840272 | 0.809211 | 0.813777 | 0.840542 | 0.849302 | 0.780298 | 0.799661 | 0.844426 | 0.787031 | 0.876235 | 0.748833 | 0.734449 | 0.715830 | 0.737366 | 0.736075 | 0.723171 | 0.719929 | 0.712678 | 0.726541 | 0.698218 | 0.684080 | 0.484111 | 0.834109 | 0.839286 | 0.843196 | 0.840499 | 0.841219 | 0.849991 | 0.847656 | 0.848204 | 0.872030 | 0.851707 | 0.850842 | 0.849773 | 0.845091 | 0.843241 | 0.842754 | 0.837955 | 0.840443 | 0.848818 | 0.848264 | 0.842051 | 0.853002 | 0.839597 | 0.845751 | 0.854474 | 0.833616 | 0.824318 | 0.840854 | 0.833288 | 0.842423 | 0.842850 | 0.841096 | 0.844746 | 0.842654 | 0.846919 | 0.833687 | 0.846134 | 0.717429 | 0.694365 | 0.682877 | 0.688189 | 0.696022 | 0.697622 | 0.696155 | 0.692092 | 0.685192 | 0.695329 | 0.687185 | 0.694720 | -0.324954 | 0.761886 |
| total_amt', Period2019-10_M | 0.535104 | 0.938838 | 0.935894 | 0.949748 | 0.947989 | 0.951378 | 0.953610 | 0.955936 | 0.955215 | 0.953389 | 1.000000 | 0.943223 | 0.956162 | 0.936699 | 0.923176 | 0.939573 | 0.861699 | 0.844712 | 0.815067 | 0.862132 | 0.845172 | 0.839838 | 0.837094 | 0.818562 | 0.881798 | 0.804190 | 0.780733 | 0.841768 | 0.814384 | 0.809214 | 0.847662 | 0.848022 | 0.790080 | 0.805233 | 0.837020 | 0.788109 | 0.878820 | 0.746373 | 0.727265 | 0.713202 | 0.731437 | 0.733245 | 0.716529 | 0.714106 | 0.716521 | 0.719490 | 0.690752 | 0.684669 | 0.482206 | 0.839741 | 0.842873 | 0.847223 | 0.846124 | 0.846933 | 0.854744 | 0.852597 | 0.852617 | 0.852242 | 0.875611 | 0.856160 | 0.855043 | 0.854185 | 0.847986 | 0.848301 | 0.841783 | 0.844232 | 0.853699 | 0.850074 | 0.845952 | 0.857104 | 0.841540 | 0.851632 | 0.859154 | 0.833951 | 0.831182 | 0.846880 | 0.837649 | 0.845420 | 0.847209 | 0.843144 | 0.847269 | 0.845240 | 0.851644 | 0.835088 | 0.848271 | 0.715502 | 0.686305 | 0.681231 | 0.682674 | 0.695304 | 0.697443 | 0.692981 | 0.692508 | 0.681990 | 0.692692 | 0.686582 | 0.691326 | -0.317660 | 0.754560 |
| total_amt', Period2019-11_M | 0.520855 | 0.933687 | 0.921929 | 0.942166 | 0.943317 | 0.944847 | 0.951268 | 0.951293 | 0.949453 | 0.943442 | 0.943223 | 1.000000 | 0.955128 | 0.931295 | 0.926052 | 0.936454 | 0.865690 | 0.845800 | 0.806814 | 0.852161 | 0.834909 | 0.838404 | 0.826590 | 0.811670 | 0.870316 | 0.787800 | 0.775049 | 0.836132 | 0.805849 | 0.810927 | 0.840910 | 0.844699 | 0.778445 | 0.799951 | 0.835893 | 0.783851 | 0.870674 | 0.751966 | 0.737661 | 0.714785 | 0.740507 | 0.739898 | 0.722863 | 0.722330 | 0.721674 | 0.722827 | 0.701690 | 0.689194 | 0.467490 | 0.822596 | 0.822838 | 0.830398 | 0.831155 | 0.828212 | 0.839350 | 0.838255 | 0.836124 | 0.834853 | 0.835944 | 0.867190 | 0.841385 | 0.833898 | 0.836569 | 0.833486 | 0.827109 | 0.829316 | 0.838042 | 0.837108 | 0.827290 | 0.843653 | 0.824964 | 0.835540 | 0.844236 | 0.819112 | 0.818102 | 0.831175 | 0.824937 | 0.832278 | 0.833911 | 0.829447 | 0.833690 | 0.833769 | 0.832424 | 0.823108 | 0.835341 | 0.709497 | 0.686723 | 0.678714 | 0.686633 | 0.692764 | 0.694308 | 0.691582 | 0.689957 | 0.679749 | 0.695174 | 0.684785 | 0.689249 | -0.312195 | 0.754430 |
| total_amt', Period2019-12_M | 0.523571 | 0.949845 | 0.936889 | 0.952763 | 0.954231 | 0.957814 | 0.963040 | 0.963679 | 0.961861 | 0.954467 | 0.956162 | 0.955128 | 1.000000 | 0.950758 | 0.937170 | 0.946857 | 0.869203 | 0.860004 | 0.826300 | 0.866345 | 0.850771 | 0.842283 | 0.837709 | 0.829413 | 0.888700 | 0.803908 | 0.785557 | 0.850956 | 0.818786 | 0.822693 | 0.848562 | 0.857952 | 0.782097 | 0.808274 | 0.839210 | 0.799571 | 0.888657 | 0.761777 | 0.751094 | 0.728168 | 0.749447 | 0.753157 | 0.734642 | 0.732053 | 0.730632 | 0.740601 | 0.710046 | 0.704329 | 0.472337 | 0.838494 | 0.837416 | 0.840928 | 0.842828 | 0.840135 | 0.850908 | 0.849921 | 0.847840 | 0.844658 | 0.852335 | 0.854351 | 0.864487 | 0.849414 | 0.845170 | 0.846034 | 0.841669 | 0.841110 | 0.853370 | 0.850083 | 0.841418 | 0.852952 | 0.838939 | 0.850622 | 0.858225 | 0.830021 | 0.830879 | 0.843387 | 0.835539 | 0.845257 | 0.845331 | 0.840239 | 0.844223 | 0.845563 | 0.844685 | 0.835941 | 0.847102 | 0.723941 | 0.703548 | 0.692590 | 0.696505 | 0.708274 | 0.710196 | 0.703130 | 0.704423 | 0.695758 | 0.706967 | 0.699163 | 0.703618 | -0.313784 | 0.771768 |
| total_amt', Period2020-01_M | 0.515719 | 0.927369 | 0.929482 | 0.929830 | 0.935805 | 0.939190 | 0.942800 | 0.939604 | 0.943884 | 0.932251 | 0.936699 | 0.931295 | 0.950758 | 1.000000 | 0.914126 | 0.929508 | 0.844622 | 0.838016 | 0.813749 | 0.843616 | 0.836454 | 0.828022 | 0.819903 | 0.810374 | 0.865941 | 0.796519 | 0.767838 | 0.837059 | 0.799749 | 0.798877 | 0.833320 | 0.837513 | 0.778458 | 0.805156 | 0.828551 | 0.774327 | 0.867870 | 0.739898 | 0.727243 | 0.707702 | 0.728251 | 0.733149 | 0.714429 | 0.707699 | 0.707351 | 0.724894 | 0.688702 | 0.687473 | 0.449970 | 0.813013 | 0.818792 | 0.813338 | 0.817815 | 0.813124 | 0.825166 | 0.820867 | 0.822599 | 0.819227 | 0.823674 | 0.824633 | 0.824816 | 0.856438 | 0.818418 | 0.819095 | 0.812018 | 0.815993 | 0.829243 | 0.820633 | 0.814795 | 0.827514 | 0.811494 | 0.822874 | 0.830157 | 0.809665 | 0.803133 | 0.820457 | 0.811332 | 0.815117 | 0.819760 | 0.815666 | 0.818845 | 0.821093 | 0.821475 | 0.808098 | 0.823219 | 0.700432 | 0.675948 | 0.664802 | 0.666009 | 0.681234 | 0.677933 | 0.677273 | 0.675784 | 0.668616 | 0.675463 | 0.667332 | 0.675469 | -0.305790 | 0.752379 |
| total_amt', Period2020-02_M | 0.529609 | 0.914979 | 0.906812 | 0.914640 | 0.913551 | 0.920633 | 0.930568 | 0.928270 | 0.927275 | 0.919621 | 0.923176 | 0.926052 | 0.937170 | 0.914126 | 1.000000 | 0.916429 | 0.853743 | 0.839630 | 0.809091 | 0.847194 | 0.812164 | 0.820546 | 0.815475 | 0.790512 | 0.860183 | 0.784185 | 0.769357 | 0.830238 | 0.809469 | 0.793411 | 0.837313 | 0.836509 | 0.766017 | 0.786983 | 0.824944 | 0.790430 | 0.868080 | 0.737732 | 0.723896 | 0.703950 | 0.730249 | 0.728000 | 0.719419 | 0.716416 | 0.710895 | 0.717045 | 0.691949 | 0.680069 | 0.469047 | 0.805934 | 0.812655 | 0.808823 | 0.812738 | 0.811102 | 0.819685 | 0.819231 | 0.818147 | 0.818629 | 0.819330 | 0.826910 | 0.823534 | 0.821145 | 0.859851 | 0.816803 | 0.817136 | 0.817325 | 0.825514 | 0.823889 | 0.813063 | 0.823819 | 0.808763 | 0.819418 | 0.829180 | 0.808745 | 0.809038 | 0.820969 | 0.812729 | 0.821374 | 0.824557 | 0.816931 | 0.820168 | 0.821949 | 0.823526 | 0.812496 | 0.823028 | 0.699454 | 0.680356 | 0.667740 | 0.674279 | 0.681749 | 0.685305 | 0.678660 | 0.676305 | 0.671894 | 0.683872 | 0.670156 | 0.675368 | -0.291454 | 0.748464 |
| total_amt', Period2020-03_M | 0.513539 | 0.935022 | 0.919464 | 0.936724 | 0.941970 | 0.939572 | 0.941453 | 0.948570 | 0.943664 | 0.935227 | 0.939573 | 0.936454 | 0.946857 | 0.929508 | 0.916429 | 1.000000 | 0.859649 | 0.851601 | 0.809412 | 0.845353 | 0.836106 | 0.832415 | 0.816521 | 0.805156 | 0.877783 | 0.802319 | 0.788457 | 0.843919 | 0.810127 | 0.827840 | 0.842586 | 0.846562 | 0.788282 | 0.801736 | 0.836418 | 0.785224 | 0.869897 | 0.752503 | 0.733344 | 0.718754 | 0.731418 | 0.738159 | 0.715960 | 0.714486 | 0.723762 | 0.734158 | 0.702185 | 0.693752 | 0.464206 | 0.829202 | 0.823799 | 0.825855 | 0.831412 | 0.826711 | 0.834914 | 0.836654 | 0.834789 | 0.831284 | 0.834757 | 0.839552 | 0.835840 | 0.832837 | 0.833949 | 0.859817 | 0.829212 | 0.829420 | 0.840067 | 0.832099 | 0.827833 | 0.839180 | 0.823068 | 0.831793 | 0.846524 | 0.822310 | 0.818005 | 0.831303 | 0.822874 | 0.835737 | 0.834130 | 0.826848 | 0.833697 | 0.831932 | 0.831107 | 0.824353 | 0.836989 | 0.709346 | 0.685964 | 0.677049 | 0.678187 | 0.691315 | 0.688463 | 0.686814 | 0.686971 | 0.679282 | 0.691428 | 0.682229 | 0.687648 | -0.316831 | 0.762595 |
| total_amt', Period2020-04_M | 0.485701 | 0.856693 | 0.854616 | 0.866597 | 0.870734 | 0.865240 | 0.876298 | 0.872090 | 0.869102 | 0.866395 | 0.861699 | 0.865690 | 0.869203 | 0.844622 | 0.853743 | 0.859649 | 1.000000 | 0.840989 | 0.801433 | 0.847051 | 0.834523 | 0.828472 | 0.822152 | 0.811324 | 0.866366 | 0.798959 | 0.773993 | 0.838165 | 0.807955 | 0.812166 | 0.835395 | 0.838280 | 0.766574 | 0.782726 | 0.832518 | 0.773285 | 0.859573 | 0.674635 | 0.644672 | 0.631092 | 0.657191 | 0.669268 | 0.672347 | 0.657171 | 0.667004 | 0.674628 | 0.646981 | 0.627969 | 0.485464 | 0.850917 | 0.849549 | 0.852736 | 0.860531 | 0.851263 | 0.862802 | 0.860017 | 0.858073 | 0.855195 | 0.855024 | 0.864256 | 0.855792 | 0.854218 | 0.858991 | 0.855590 | 0.888219 | 0.856457 | 0.863245 | 0.861838 | 0.855086 | 0.860032 | 0.853030 | 0.859128 | 0.866113 | 0.850791 | 0.854158 | 0.861158 | 0.853805 | 0.858827 | 0.855394 | 0.858764 | 0.861111 | 0.859388 | 0.861739 | 0.853786 | 0.862990 | 0.697603 | 0.669698 | 0.657860 | 0.666771 | 0.675192 | 0.673857 | 0.670431 | 0.666265 | 0.660779 | 0.674390 | 0.661446 | 0.671346 | -0.248190 | 0.697025 |
| total_amt', Period2020-05_M | 0.446822 | 0.842186 | 0.836868 | 0.849458 | 0.851876 | 0.845355 | 0.853907 | 0.856496 | 0.850315 | 0.844398 | 0.844712 | 0.845800 | 0.860004 | 0.838016 | 0.839630 | 0.851601 | 0.840989 | 1.000000 | 0.806343 | 0.847871 | 0.837474 | 0.813473 | 0.816868 | 0.813686 | 0.861713 | 0.804348 | 0.781152 | 0.826486 | 0.818018 | 0.810488 | 0.834491 | 0.832271 | 0.775290 | 0.795886 | 0.824940 | 0.776832 | 0.859753 | 0.657608 | 0.632602 | 0.603902 | 0.630553 | 0.644195 | 0.652644 | 0.645686 | 0.659361 | 0.657596 | 0.637444 | 0.613387 | 0.479081 | 0.864858 | 0.864501 | 0.867290 | 0.869178 | 0.863429 | 0.874092 | 0.876971 | 0.869166 | 0.866643 | 0.865995 | 0.873185 | 0.877885 | 0.869979 | 0.873607 | 0.873574 | 0.878455 | 0.893951 | 0.877946 | 0.871269 | 0.867331 | 0.866935 | 0.865063 | 0.874658 | 0.877283 | 0.868204 | 0.858249 | 0.863834 | 0.870206 | 0.869265 | 0.869753 | 0.872193 | 0.881120 | 0.875106 | 0.872996 | 0.864882 | 0.879445 | 0.699806 | 0.673265 | 0.662994 | 0.664106 | 0.678629 | 0.679130 | 0.669480 | 0.670035 | 0.673062 | 0.675682 | 0.666054 | 0.674330 | -0.280079 | 0.709183 |
| total_amt', Period2020-06_M | 0.445354 | 0.808883 | 0.801239 | 0.812309 | 0.816102 | 0.823189 | 0.817987 | 0.817224 | 0.823379 | 0.817040 | 0.815067 | 0.806814 | 0.826300 | 0.813749 | 0.809091 | 0.809412 | 0.801433 | 0.806343 | 1.000000 | 0.824093 | 0.790245 | 0.784600 | 0.793989 | 0.776071 | 0.844297 | 0.776442 | 0.747498 | 0.795729 | 0.778834 | 0.770877 | 0.796804 | 0.797548 | 0.749587 | 0.745775 | 0.794994 | 0.744705 | 0.829917 | 0.623147 | 0.606901 | 0.585160 | 0.608398 | 0.621053 | 0.645636 | 0.629752 | 0.619612 | 0.638981 | 0.614928 | 0.605353 | 0.472909 | 0.831444 | 0.823604 | 0.827087 | 0.832778 | 0.835224 | 0.834572 | 0.832209 | 0.835806 | 0.834563 | 0.836785 | 0.836454 | 0.837656 | 0.844153 | 0.831936 | 0.835972 | 0.835339 | 0.832715 | 0.866434 | 0.842639 | 0.837935 | 0.832636 | 0.831023 | 0.836407 | 0.843585 | 0.833885 | 0.825870 | 0.836477 | 0.831869 | 0.835499 | 0.835679 | 0.829580 | 0.836382 | 0.834678 | 0.845524 | 0.832502 | 0.841700 | 0.679121 | 0.654875 | 0.643607 | 0.647125 | 0.660406 | 0.663682 | 0.656154 | 0.650962 | 0.647449 | 0.656761 | 0.647089 | 0.655558 | -0.253134 | 0.672751 |
| total_amt', Period2020-07_M | 0.501977 | 0.849753 | 0.845848 | 0.860593 | 0.860193 | 0.857898 | 0.863368 | 0.862849 | 0.862467 | 0.857196 | 0.862132 | 0.852161 | 0.866345 | 0.843616 | 0.847194 | 0.845353 | 0.847051 | 0.847871 | 0.824093 | 1.000000 | 0.848807 | 0.823132 | 0.835570 | 0.819269 | 0.864159 | 0.792638 | 0.793667 | 0.836056 | 0.825752 | 0.801458 | 0.843469 | 0.835033 | 0.781385 | 0.783119 | 0.827048 | 0.784321 | 0.856908 | 0.659585 | 0.637151 | 0.614829 | 0.635527 | 0.645198 | 0.662911 | 0.666026 | 0.652286 | 0.662503 | 0.640796 | 0.640435 | 0.510456 | 0.862340 | 0.862528 | 0.872208 | 0.872781 | 0.860464 | 0.875285 | 0.867922 | 0.873284 | 0.870173 | 0.872540 | 0.875436 | 0.870269 | 0.870584 | 0.871446 | 0.867747 | 0.872674 | 0.870352 | 0.878817 | 0.901283 | 0.867992 | 0.872829 | 0.864773 | 0.871239 | 0.876992 | 0.863269 | 0.860332 | 0.872952 | 0.866749 | 0.866498 | 0.871740 | 0.865096 | 0.874206 | 0.870715 | 0.874834 | 0.866862 | 0.877744 | 0.703439 | 0.671278 | 0.666287 | 0.663314 | 0.678809 | 0.675417 | 0.671938 | 0.668603 | 0.667312 | 0.674469 | 0.667621 | 0.673420 | -0.250594 | 0.701580 |
| total_amt', Period2020-08_M | 0.477245 | 0.835867 | 0.824963 | 0.849638 | 0.849607 | 0.849753 | 0.848563 | 0.851660 | 0.846218 | 0.839314 | 0.845172 | 0.834909 | 0.850771 | 0.836454 | 0.812164 | 0.836106 | 0.834523 | 0.837474 | 0.790245 | 0.848807 | 1.000000 | 0.806364 | 0.820203 | 0.816261 | 0.852551 | 0.799862 | 0.773943 | 0.822494 | 0.823116 | 0.794878 | 0.838124 | 0.832145 | 0.776504 | 0.774239 | 0.820864 | 0.762720 | 0.856917 | 0.658404 | 0.628213 | 0.614694 | 0.630875 | 0.652513 | 0.656614 | 0.652862 | 0.657128 | 0.659793 | 0.632172 | 0.621899 | 0.501250 | 0.867221 | 0.862084 | 0.872678 | 0.871187 | 0.869819 | 0.874352 | 0.878364 | 0.874050 | 0.868886 | 0.876637 | 0.877776 | 0.877918 | 0.876997 | 0.865883 | 0.875042 | 0.871847 | 0.872896 | 0.878615 | 0.877998 | 0.900458 | 0.879377 | 0.867329 | 0.877149 | 0.880929 | 0.870218 | 0.867196 | 0.877207 | 0.870659 | 0.871851 | 0.870259 | 0.876273 | 0.879403 | 0.870947 | 0.878606 | 0.868488 | 0.881406 | 0.710189 | 0.680931 | 0.677098 | 0.674186 | 0.686733 | 0.684657 | 0.680064 | 0.678642 | 0.679077 | 0.684896 | 0.673862 | 0.681997 | -0.262369 | 0.694828 |
| total_amt', Period2020-09_M | 0.486197 | 0.824464 | 0.813410 | 0.835887 | 0.833377 | 0.841499 | 0.836587 | 0.843389 | 0.836732 | 0.841009 | 0.839838 | 0.838404 | 0.842283 | 0.828022 | 0.820546 | 0.832415 | 0.828472 | 0.813473 | 0.784600 | 0.823132 | 0.806364 | 1.000000 | 0.802783 | 0.783222 | 0.847025 | 0.780185 | 0.749586 | 0.822932 | 0.789009 | 0.778451 | 0.818017 | 0.824686 | 0.768973 | 0.778712 | 0.806085 | 0.740755 | 0.838687 | 0.671684 | 0.648304 | 0.621175 | 0.657036 | 0.662651 | 0.686581 | 0.654940 | 0.666492 | 0.664407 | 0.635497 | 0.635092 | 0.490791 | 0.834040 | 0.832155 | 0.843911 | 0.842371 | 0.845875 | 0.850917 | 0.850918 | 0.846457 | 0.848898 | 0.846841 | 0.850451 | 0.847964 | 0.851757 | 0.846598 | 0.847050 | 0.849668 | 0.842475 | 0.854636 | 0.851212 | 0.845916 | 0.882716 | 0.841136 | 0.843593 | 0.857063 | 0.836720 | 0.830926 | 0.854235 | 0.844610 | 0.843989 | 0.849042 | 0.853595 | 0.853383 | 0.848028 | 0.845503 | 0.835618 | 0.856426 | 0.704659 | 0.675514 | 0.669384 | 0.675292 | 0.682876 | 0.683377 | 0.678496 | 0.678743 | 0.674979 | 0.683370 | 0.672602 | 0.677314 | -0.263945 | 0.693890 |
| total_amt', Period2020-10_M | 0.466752 | 0.821580 | 0.814041 | 0.828208 | 0.834691 | 0.837901 | 0.833242 | 0.842050 | 0.831373 | 0.837542 | 0.837094 | 0.826590 | 0.837709 | 0.819903 | 0.815475 | 0.816521 | 0.822152 | 0.816868 | 0.793989 | 0.835570 | 0.820203 | 0.802783 | 1.000000 | 0.793756 | 0.851424 | 0.773435 | 0.770448 | 0.817261 | 0.815349 | 0.780432 | 0.818086 | 0.815409 | 0.756906 | 0.780611 | 0.823262 | 0.768379 | 0.840438 | 0.640238 | 0.635065 | 0.598678 | 0.616873 | 0.624113 | 0.643810 | 0.642746 | 0.630037 | 0.639901 | 0.625242 | 0.609468 | 0.485507 | 0.847965 | 0.845926 | 0.852320 | 0.853066 | 0.856400 | 0.857115 | 0.862387 | 0.857710 | 0.863365 | 0.860262 | 0.865578 | 0.859502 | 0.861607 | 0.856971 | 0.853675 | 0.858267 | 0.856794 | 0.862192 | 0.864724 | 0.856583 | 0.866177 | 0.891589 | 0.856815 | 0.863899 | 0.847512 | 0.853544 | 0.856093 | 0.860606 | 0.855702 | 0.863612 | 0.859499 | 0.853258 | 0.859555 | 0.859741 | 0.856345 | 0.863875 | 0.685613 | 0.663714 | 0.650814 | 0.653688 | 0.657781 | 0.662728 | 0.657600 | 0.659911 | 0.654772 | 0.662381 | 0.653144 | 0.658319 | -0.236607 | 0.674775 |
| total_amt', Period2020-11_M | 0.458473 | 0.800321 | 0.794866 | 0.827572 | 0.821159 | 0.821685 | 0.824595 | 0.825836 | 0.821638 | 0.819573 | 0.818562 | 0.811670 | 0.829413 | 0.810374 | 0.790512 | 0.805156 | 0.811324 | 0.813686 | 0.776071 | 0.819269 | 0.816261 | 0.783222 | 0.793756 | 1.000000 | 0.826117 | 0.765628 | 0.751847 | 0.804346 | 0.795276 | 0.776687 | 0.815931 | 0.816836 | 0.746658 | 0.743482 | 0.800522 | 0.730007 | 0.837022 | 0.643298 | 0.617644 | 0.596428 | 0.619163 | 0.645709 | 0.653662 | 0.637574 | 0.640429 | 0.644054 | 0.643198 | 0.607116 | 0.482933 | 0.827033 | 0.823170 | 0.839549 | 0.838775 | 0.836375 | 0.845160 | 0.843069 | 0.835894 | 0.843875 | 0.838072 | 0.842200 | 0.841384 | 0.839261 | 0.832610 | 0.840949 | 0.834632 | 0.838791 | 0.849870 | 0.843759 | 0.839161 | 0.849412 | 0.830690 | 0.876984 | 0.844049 | 0.834519 | 0.824016 | 0.845602 | 0.836483 | 0.838458 | 0.847664 | 0.839343 | 0.842186 | 0.836770 | 0.846757 | 0.832861 | 0.848910 | 0.693071 | 0.664956 | 0.659089 | 0.657854 | 0.674329 | 0.669810 | 0.666289 | 0.663248 | 0.656360 | 0.667404 | 0.652119 | 0.668906 | -0.257556 | 0.689751 |
| total_amt', Period2020-12_M | 0.505537 | 0.875595 | 0.864707 | 0.877416 | 0.879438 | 0.881403 | 0.879509 | 0.882305 | 0.884477 | 0.875705 | 0.881798 | 0.870316 | 0.888700 | 0.865941 | 0.860183 | 0.877783 | 0.866366 | 0.861713 | 0.844297 | 0.864159 | 0.852551 | 0.847025 | 0.851424 | 0.826117 | 1.000000 | 0.815409 | 0.803312 | 0.857624 | 0.839505 | 0.823415 | 0.865175 | 0.863391 | 0.790196 | 0.806318 | 0.836247 | 0.799932 | 0.883340 | 0.670343 | 0.645004 | 0.618703 | 0.645347 | 0.655855 | 0.682345 | 0.666876 | 0.672354 | 0.660114 | 0.648611 | 0.640331 | 0.513899 | 0.885136 | 0.884840 | 0.884587 | 0.890986 | 0.887294 | 0.891971 | 0.892698 | 0.892099 | 0.889714 | 0.896240 | 0.894014 | 0.896682 | 0.891403 | 0.883090 | 0.893379 | 0.893720 | 0.886409 | 0.891179 | 0.893655 | 0.890150 | 0.896164 | 0.891328 | 0.893927 | 0.920369 | 0.879853 | 0.882983 | 0.895722 | 0.888803 | 0.890501 | 0.895285 | 0.891745 | 0.890474 | 0.891990 | 0.893257 | 0.885006 | 0.896249 | 0.712751 | 0.681482 | 0.673947 | 0.679059 | 0.688090 | 0.688169 | 0.681234 | 0.680155 | 0.674944 | 0.684281 | 0.676305 | 0.682646 | -0.266159 | 0.706695 |
| total_amt', Period2021-01_M | 0.475332 | 0.800278 | 0.790995 | 0.800524 | 0.801753 | 0.800141 | 0.804130 | 0.805844 | 0.808699 | 0.809148 | 0.804190 | 0.787800 | 0.803908 | 0.796519 | 0.784185 | 0.802319 | 0.798959 | 0.804348 | 0.776442 | 0.792638 | 0.799862 | 0.780185 | 0.773435 | 0.765628 | 0.815409 | 1.000000 | 0.753068 | 0.795844 | 0.789011 | 0.761779 | 0.814013 | 0.796879 | 0.760450 | 0.754640 | 0.792968 | 0.759878 | 0.828306 | 0.653574 | 0.620659 | 0.593945 | 0.616883 | 0.635610 | 0.649472 | 0.657766 | 0.655386 | 0.672920 | 0.638459 | 0.621594 | 0.498540 | 0.826927 | 0.822855 | 0.828691 | 0.828320 | 0.828446 | 0.831176 | 0.833155 | 0.831994 | 0.836803 | 0.831566 | 0.826846 | 0.836362 | 0.833559 | 0.828894 | 0.836179 | 0.832967 | 0.834465 | 0.842782 | 0.835311 | 0.840271 | 0.837671 | 0.827259 | 0.834488 | 0.840067 | 0.875907 | 0.834174 | 0.835654 | 0.838788 | 0.836818 | 0.843795 | 0.838308 | 0.838745 | 0.842528 | 0.835343 | 0.833453 | 0.838157 | 0.698680 | 0.675285 | 0.666183 | 0.662207 | 0.673722 | 0.676060 | 0.678518 | 0.672926 | 0.670954 | 0.678259 | 0.668173 | 0.673478 | -0.287751 | 0.688967 |
| total_amt', Period2021-02_M | 0.415254 | 0.773230 | 0.767196 | 0.781997 | 0.783943 | 0.780155 | 0.782277 | 0.788315 | 0.792435 | 0.775294 | 0.780733 | 0.775049 | 0.785557 | 0.767838 | 0.769357 | 0.788457 | 0.773993 | 0.781152 | 0.747498 | 0.793667 | 0.773943 | 0.749586 | 0.770448 | 0.751847 | 0.803312 | 0.753068 | 1.000000 | 0.793621 | 0.766442 | 0.746870 | 0.794749 | 0.776078 | 0.743634 | 0.722422 | 0.770963 | 0.718047 | 0.815118 | 0.606273 | 0.574141 | 0.561752 | 0.581066 | 0.599953 | 0.602978 | 0.605969 | 0.619095 | 0.608305 | 0.602918 | 0.587923 | 0.453495 | 0.810037 | 0.811205 | 0.816737 | 0.817614 | 0.809714 | 0.819773 | 0.819775 | 0.827147 | 0.818336 | 0.814980 | 0.818779 | 0.819606 | 0.817766 | 0.811999 | 0.824670 | 0.817253 | 0.813023 | 0.820866 | 0.824153 | 0.814214 | 0.820814 | 0.811606 | 0.815329 | 0.823971 | 0.810237 | 0.857156 | 0.826597 | 0.822809 | 0.816305 | 0.825359 | 0.817188 | 0.820443 | 0.825624 | 0.822236 | 0.814392 | 0.826159 | 0.657298 | 0.624002 | 0.622298 | 0.620498 | 0.631581 | 0.627268 | 0.626695 | 0.625790 | 0.625327 | 0.630377 | 0.617407 | 0.621871 | -0.256310 | 0.638166 |
| total_amt', Period2021-03_M | 0.482738 | 0.837261 | 0.836871 | 0.842405 | 0.841836 | 0.840020 | 0.846208 | 0.846252 | 0.846651 | 0.840272 | 0.841768 | 0.836132 | 0.850956 | 0.837059 | 0.830238 | 0.843919 | 0.838165 | 0.826486 | 0.795729 | 0.836056 | 0.822494 | 0.822932 | 0.817261 | 0.804346 | 0.857624 | 0.795844 | 0.793621 | 1.000000 | 0.816650 | 0.802094 | 0.829238 | 0.833204 | 0.771792 | 0.775037 | 0.830704 | 0.770031 | 0.851767 | 0.660021 | 0.635524 | 0.627351 | 0.643777 | 0.657956 | 0.677890 | 0.663152 | 0.666305 | 0.667574 | 0.653069 | 0.645942 | 0.497957 | 0.848762 | 0.853107 | 0.853374 | 0.854308 | 0.848513 | 0.860435 | 0.858640 | 0.857949 | 0.854747 | 0.852611 | 0.858948 | 0.859462 | 0.861288 | 0.855663 | 0.858192 | 0.860176 | 0.854299 | 0.862440 | 0.858998 | 0.854820 | 0.864524 | 0.848891 | 0.855297 | 0.862997 | 0.853340 | 0.850078 | 0.887800 | 0.854594 | 0.858874 | 0.862108 | 0.857279 | 0.857385 | 0.855611 | 0.861903 | 0.854579 | 0.863449 | 0.706977 | 0.677974 | 0.674358 | 0.674833 | 0.686381 | 0.686826 | 0.682636 | 0.677295 | 0.675826 | 0.680370 | 0.673343 | 0.682048 | -0.264663 | 0.711260 |
| total_amt', Period2021-04_M | 0.477579 | 0.801300 | 0.796464 | 0.809728 | 0.817693 | 0.811467 | 0.810119 | 0.817480 | 0.820024 | 0.809211 | 0.814384 | 0.805849 | 0.818786 | 0.799749 | 0.809469 | 0.810127 | 0.807955 | 0.818018 | 0.778834 | 0.825752 | 0.823116 | 0.789009 | 0.815349 | 0.795276 | 0.839505 | 0.789011 | 0.766442 | 0.816650 | 1.000000 | 0.784325 | 0.821260 | 0.808775 | 0.762056 | 0.761180 | 0.810708 | 0.759477 | 0.839062 | 0.627065 | 0.587097 | 0.566314 | 0.583734 | 0.612427 | 0.617552 | 0.614708 | 0.634151 | 0.627048 | 0.596953 | 0.606619 | 0.492438 | 0.833457 | 0.832401 | 0.838989 | 0.844775 | 0.840916 | 0.843816 | 0.844805 | 0.845663 | 0.845130 | 0.845672 | 0.851205 | 0.845918 | 0.844403 | 0.845624 | 0.846180 | 0.846650 | 0.850693 | 0.851523 | 0.846847 | 0.849836 | 0.848851 | 0.841230 | 0.847128 | 0.853788 | 0.841850 | 0.841145 | 0.844547 | 0.878431 | 0.846598 | 0.852480 | 0.844497 | 0.850550 | 0.849588 | 0.852824 | 0.843407 | 0.852731 | 0.675977 | 0.637803 | 0.636735 | 0.632547 | 0.649073 | 0.643742 | 0.639502 | 0.646275 | 0.634655 | 0.644966 | 0.633539 | 0.643505 | -0.265520 | 0.666261 |
| total_amt', Period2021-05_M | 0.460709 | 0.808359 | 0.800576 | 0.821268 | 0.820705 | 0.820852 | 0.823106 | 0.818173 | 0.819695 | 0.813777 | 0.809214 | 0.810927 | 0.822693 | 0.798877 | 0.793411 | 0.827840 | 0.812166 | 0.810488 | 0.770877 | 0.801458 | 0.794878 | 0.778451 | 0.780432 | 0.776687 | 0.823415 | 0.761779 | 0.746870 | 0.802094 | 0.784325 | 1.000000 | 0.810763 | 0.800745 | 0.740192 | 0.753670 | 0.786160 | 0.737556 | 0.832624 | 0.645610 | 0.608060 | 0.606820 | 0.624844 | 0.632315 | 0.652936 | 0.644667 | 0.659245 | 0.637699 | 0.635143 | 0.609416 | 0.477867 | 0.821174 | 0.820285 | 0.834354 | 0.829801 | 0.825002 | 0.835081 | 0.831893 | 0.834351 | 0.830942 | 0.823723 | 0.832225 | 0.835571 | 0.829002 | 0.824469 | 0.834639 | 0.839882 | 0.835133 | 0.841168 | 0.833140 | 0.829678 | 0.826085 | 0.828289 | 0.835415 | 0.838782 | 0.825547 | 0.829252 | 0.833328 | 0.825350 | 0.864922 | 0.835499 | 0.830185 | 0.841012 | 0.829128 | 0.835904 | 0.827985 | 0.840924 | 0.682073 | 0.651841 | 0.645395 | 0.651801 | 0.658279 | 0.660551 | 0.656361 | 0.653789 | 0.646701 | 0.662196 | 0.651326 | 0.658233 | -0.281386 | 0.678258 |
| total_amt', Period2021-06_M | 0.461116 | 0.837400 | 0.832140 | 0.847173 | 0.845892 | 0.845813 | 0.847522 | 0.849906 | 0.852928 | 0.840542 | 0.847662 | 0.840910 | 0.848562 | 0.833320 | 0.837313 | 0.842586 | 0.835395 | 0.834491 | 0.796804 | 0.843469 | 0.838124 | 0.818017 | 0.818086 | 0.815931 | 0.865175 | 0.814013 | 0.794749 | 0.829238 | 0.821260 | 0.810763 | 1.000000 | 0.840568 | 0.781678 | 0.802595 | 0.830789 | 0.774766 | 0.867800 | 0.675497 | 0.632633 | 0.613937 | 0.639062 | 0.651142 | 0.666822 | 0.670441 | 0.682974 | 0.668824 | 0.651563 | 0.629076 | 0.492030 | 0.867395 | 0.872866 | 0.875990 | 0.879482 | 0.873069 | 0.880029 | 0.879515 | 0.882775 | 0.875472 | 0.877032 | 0.882089 | 0.883489 | 0.885047 | 0.878974 | 0.886203 | 0.884154 | 0.885528 | 0.889857 | 0.883911 | 0.882278 | 0.885690 | 0.878975 | 0.883382 | 0.890442 | 0.879342 | 0.879574 | 0.887600 | 0.882491 | 0.884063 | 0.912472 | 0.884522 | 0.886666 | 0.887948 | 0.887688 | 0.879331 | 0.891030 | 0.727882 | 0.692078 | 0.691239 | 0.695940 | 0.699393 | 0.699280 | 0.693067 | 0.699583 | 0.687320 | 0.705856 | 0.692138 | 0.699529 | -0.316324 | 0.709477 |
| total_amt', Period2021-07_M | 0.478910 | 0.847710 | 0.839360 | 0.841998 | 0.850282 | 0.855812 | 0.850500 | 0.858493 | 0.863354 | 0.849302 | 0.848022 | 0.844699 | 0.857952 | 0.837513 | 0.836509 | 0.846562 | 0.838280 | 0.832271 | 0.797548 | 0.835033 | 0.832145 | 0.824686 | 0.815409 | 0.816836 | 0.863391 | 0.796879 | 0.776078 | 0.833204 | 0.808775 | 0.800745 | 0.840568 | 1.000000 | 0.772714 | 0.801728 | 0.845850 | 0.774571 | 0.861364 | 0.675025 | 0.647721 | 0.630640 | 0.650572 | 0.666436 | 0.675242 | 0.674190 | 0.681045 | 0.680159 | 0.641775 | 0.643938 | 0.496206 | 0.858029 | 0.857599 | 0.856516 | 0.858813 | 0.859492 | 0.866968 | 0.867560 | 0.868081 | 0.864561 | 0.862307 | 0.868110 | 0.870231 | 0.865490 | 0.860657 | 0.864568 | 0.864139 | 0.868226 | 0.869002 | 0.870599 | 0.865589 | 0.873158 | 0.867138 | 0.869573 | 0.877406 | 0.860408 | 0.857013 | 0.869511 | 0.867627 | 0.868917 | 0.872879 | 0.889547 | 0.867957 | 0.871057 | 0.873343 | 0.858957 | 0.876728 | 0.715777 | 0.687885 | 0.682778 | 0.684271 | 0.695373 | 0.691893 | 0.689911 | 0.690176 | 0.688407 | 0.688227 | 0.680297 | 0.691764 | -0.302711 | 0.722674 |
| total_amt', Period2021-08_M | 0.475206 | 0.778866 | 0.790242 | 0.788268 | 0.785117 | 0.792227 | 0.794651 | 0.787473 | 0.789950 | 0.780298 | 0.790080 | 0.778445 | 0.782097 | 0.778458 | 0.766017 | 0.788282 | 0.766574 | 0.775290 | 0.749587 | 0.781385 | 0.776504 | 0.768973 | 0.756906 | 0.746658 | 0.790196 | 0.760450 | 0.743634 | 0.771792 | 0.762056 | 0.740192 | 0.781678 | 0.772714 | 1.000000 | 0.741208 | 0.773351 | 0.715888 | 0.812595 | 0.629066 | 0.594217 | 0.575848 | 0.594476 | 0.599730 | 0.623365 | 0.616294 | 0.632242 | 0.638999 | 0.595940 | 0.598759 | 0.484118 | 0.803068 | 0.815242 | 0.814943 | 0.813133 | 0.812509 | 0.818008 | 0.814901 | 0.815709 | 0.813548 | 0.811621 | 0.808976 | 0.813579 | 0.827901 | 0.809988 | 0.813509 | 0.808143 | 0.810175 | 0.821607 | 0.817686 | 0.815161 | 0.819689 | 0.808631 | 0.812972 | 0.821615 | 0.812190 | 0.802824 | 0.817210 | 0.820722 | 0.815768 | 0.815288 | 0.814668 | 0.839336 | 0.815995 | 0.814881 | 0.809838 | 0.822088 | 0.673349 | 0.648396 | 0.641121 | 0.638037 | 0.646214 | 0.649635 | 0.644750 | 0.644037 | 0.641402 | 0.648356 | 0.633065 | 0.640839 | -0.287198 | 0.651812 |
| total_amt', Period2021-09_M | 0.483292 | 0.797004 | 0.791644 | 0.807852 | 0.794618 | 0.808272 | 0.814241 | 0.814664 | 0.810299 | 0.799661 | 0.805233 | 0.799951 | 0.808274 | 0.805156 | 0.786983 | 0.801736 | 0.782726 | 0.795886 | 0.745775 | 0.783119 | 0.774239 | 0.778712 | 0.780611 | 0.743482 | 0.806318 | 0.754640 | 0.722422 | 0.775037 | 0.761180 | 0.753670 | 0.802595 | 0.801728 | 0.741208 | 1.000000 | 0.765884 | 0.738678 | 0.805963 | 0.643848 | 0.608903 | 0.592061 | 0.620547 | 0.613804 | 0.633848 | 0.623896 | 0.640626 | 0.640622 | 0.614235 | 0.599072 | 0.501903 | 0.796010 | 0.793907 | 0.807301 | 0.795793 | 0.800381 | 0.812641 | 0.807560 | 0.803491 | 0.802012 | 0.800364 | 0.813086 | 0.808460 | 0.809180 | 0.801170 | 0.806146 | 0.801620 | 0.812169 | 0.810846 | 0.805930 | 0.800896 | 0.809093 | 0.812074 | 0.804639 | 0.809342 | 0.800130 | 0.798934 | 0.808183 | 0.800815 | 0.803747 | 0.811085 | 0.809659 | 0.813269 | 0.835917 | 0.809729 | 0.803070 | 0.812219 | 0.671201 | 0.632295 | 0.632333 | 0.631868 | 0.635526 | 0.638216 | 0.637049 | 0.637886 | 0.628512 | 0.639768 | 0.632765 | 0.635479 | -0.275483 | 0.662267 |
| total_amt', Period2021-10_M | 0.476297 | 0.834971 | 0.824704 | 0.843001 | 0.845026 | 0.844530 | 0.843420 | 0.851908 | 0.851879 | 0.844426 | 0.837020 | 0.835893 | 0.839210 | 0.828551 | 0.824944 | 0.836418 | 0.832518 | 0.824940 | 0.794994 | 0.827048 | 0.820864 | 0.806085 | 0.823262 | 0.800522 | 0.836247 | 0.792968 | 0.770963 | 0.830704 | 0.810708 | 0.786160 | 0.830789 | 0.845850 | 0.773351 | 0.765884 | 1.000000 | 0.759562 | 0.844598 | 0.659359 | 0.649688 | 0.614980 | 0.647098 | 0.650972 | 0.659179 | 0.656405 | 0.665522 | 0.682014 | 0.642610 | 0.625860 | 0.483651 | 0.839599 | 0.844295 | 0.848654 | 0.848252 | 0.849656 | 0.845425 | 0.852940 | 0.850754 | 0.851162 | 0.844691 | 0.849273 | 0.852317 | 0.852969 | 0.843693 | 0.847782 | 0.853793 | 0.854646 | 0.858662 | 0.855844 | 0.850594 | 0.854147 | 0.851381 | 0.856394 | 0.860173 | 0.851020 | 0.839201 | 0.854586 | 0.851739 | 0.853495 | 0.856025 | 0.853465 | 0.851245 | 0.848588 | 0.880462 | 0.849140 | 0.858108 | 0.703209 | 0.674024 | 0.660750 | 0.668346 | 0.677952 | 0.677476 | 0.672796 | 0.670578 | 0.672335 | 0.675826 | 0.664605 | 0.675287 | -0.292653 | 0.709599 |
| total_amt', Period2021-11_M | 0.446306 | 0.789355 | 0.786317 | 0.787829 | 0.782626 | 0.797268 | 0.791439 | 0.796234 | 0.793491 | 0.787031 | 0.788109 | 0.783851 | 0.799571 | 0.774327 | 0.790430 | 0.785224 | 0.773285 | 0.776832 | 0.744705 | 0.784321 | 0.762720 | 0.740755 | 0.768379 | 0.730007 | 0.799932 | 0.759878 | 0.718047 | 0.770031 | 0.759477 | 0.737556 | 0.774766 | 0.774571 | 0.715888 | 0.738678 | 0.759562 | 1.000000 | 0.806195 | 0.628101 | 0.606975 | 0.590814 | 0.606378 | 0.623140 | 0.632374 | 0.634487 | 0.639084 | 0.641317 | 0.611075 | 0.609270 | 0.462200 | 0.797547 | 0.802583 | 0.797377 | 0.796717 | 0.798538 | 0.801744 | 0.803974 | 0.800960 | 0.802815 | 0.805998 | 0.804020 | 0.806325 | 0.797452 | 0.803746 | 0.803126 | 0.806219 | 0.802144 | 0.806326 | 0.803352 | 0.797764 | 0.800672 | 0.801389 | 0.800262 | 0.809490 | 0.801373 | 0.800263 | 0.804833 | 0.803776 | 0.806309 | 0.800385 | 0.805457 | 0.808275 | 0.809525 | 0.803156 | 0.830238 | 0.812390 | 0.670213 | 0.642693 | 0.634674 | 0.636572 | 0.648841 | 0.648653 | 0.641581 | 0.645226 | 0.646185 | 0.645588 | 0.640701 | 0.644104 | -0.277398 | 0.666978 |
| total_amt', Period2021-12_M | 0.471941 | 0.866196 | 0.869827 | 0.878504 | 0.876798 | 0.880724 | 0.886241 | 0.884223 | 0.891064 | 0.876235 | 0.878820 | 0.870674 | 0.888657 | 0.867870 | 0.868080 | 0.869897 | 0.859573 | 0.859753 | 0.829917 | 0.856908 | 0.856917 | 0.838687 | 0.840438 | 0.837022 | 0.883340 | 0.828306 | 0.815118 | 0.851767 | 0.839062 | 0.832624 | 0.867800 | 0.861364 | 0.812595 | 0.805963 | 0.844598 | 0.806195 | 1.000000 | 0.695607 | 0.656057 | 0.634426 | 0.661567 | 0.682089 | 0.690018 | 0.693509 | 0.696125 | 0.697670 | 0.671312 | 0.658042 | 0.498203 | 0.882349 | 0.887027 | 0.891025 | 0.890029 | 0.890360 | 0.897324 | 0.894151 | 0.898391 | 0.891175 | 0.895028 | 0.895872 | 0.897528 | 0.892282 | 0.891653 | 0.893022 | 0.887143 | 0.891476 | 0.901088 | 0.894689 | 0.895363 | 0.896496 | 0.888408 | 0.896863 | 0.902532 | 0.889742 | 0.886847 | 0.895471 | 0.892859 | 0.893501 | 0.900457 | 0.893634 | 0.898360 | 0.895177 | 0.896317 | 0.889986 | 0.915550 | 0.734516 | 0.700906 | 0.695031 | 0.697487 | 0.706582 | 0.707091 | 0.705079 | 0.701228 | 0.699030 | 0.706657 | 0.695593 | 0.701500 | -0.318287 | 0.731005 |
| total_amt', Period2022-01_M | 0.425631 | 0.738596 | 0.720941 | 0.744890 | 0.754006 | 0.747312 | 0.749527 | 0.752446 | 0.748756 | 0.748833 | 0.746373 | 0.751966 | 0.761777 | 0.739898 | 0.737732 | 0.752503 | 0.674635 | 0.657608 | 0.623147 | 0.659585 | 0.658404 | 0.671684 | 0.640238 | 0.643298 | 0.670343 | 0.653574 | 0.606273 | 0.660021 | 0.627065 | 0.645610 | 0.675497 | 0.675025 | 0.629066 | 0.643848 | 0.659359 | 0.628101 | 0.695607 | 1.000000 | 0.883676 | 0.881338 | 0.896079 | 0.894339 | 0.855823 | 0.857600 | 0.858190 | 0.865026 | 0.851335 | 0.823180 | 0.368353 | 0.639137 | 0.632641 | 0.647443 | 0.648542 | 0.644262 | 0.649411 | 0.652859 | 0.647894 | 0.648796 | 0.649143 | 0.656003 | 0.654720 | 0.653893 | 0.654266 | 0.647110 | 0.643662 | 0.647402 | 0.657238 | 0.650441 | 0.655151 | 0.661767 | 0.636302 | 0.654463 | 0.657859 | 0.654801 | 0.633372 | 0.652921 | 0.643644 | 0.653296 | 0.659509 | 0.654579 | 0.651456 | 0.658455 | 0.649211 | 0.644061 | 0.650143 | 0.870999 | 0.841274 | 0.847472 | 0.851319 | 0.851104 | 0.851914 | 0.850658 | 0.849944 | 0.845167 | 0.854856 | 0.849208 | 0.850431 | -0.414632 | 0.881193 |
| total_amt', Period2022-02_M | 0.423659 | 0.723252 | 0.714472 | 0.729361 | 0.736801 | 0.735611 | 0.737941 | 0.739745 | 0.734571 | 0.734449 | 0.727265 | 0.737661 | 0.751094 | 0.727243 | 0.723896 | 0.733344 | 0.644672 | 0.632602 | 0.606901 | 0.637151 | 0.628213 | 0.648304 | 0.635065 | 0.617644 | 0.645004 | 0.620659 | 0.574141 | 0.635524 | 0.587097 | 0.608060 | 0.632633 | 0.647721 | 0.594217 | 0.608903 | 0.649688 | 0.606975 | 0.656057 | 0.883676 | 1.000000 | 0.900810 | 0.904186 | 0.898019 | 0.858006 | 0.859814 | 0.856983 | 0.879038 | 0.845883 | 0.824174 | 0.350528 | 0.594229 | 0.600171 | 0.604270 | 0.610641 | 0.607653 | 0.608878 | 0.612684 | 0.604810 | 0.607961 | 0.609502 | 0.618120 | 0.612870 | 0.613016 | 0.612110 | 0.605830 | 0.598896 | 0.607639 | 0.614022 | 0.614259 | 0.612405 | 0.623802 | 0.594838 | 0.614687 | 0.617508 | 0.606810 | 0.590488 | 0.606438 | 0.596235 | 0.614028 | 0.614823 | 0.611080 | 0.606073 | 0.617068 | 0.610966 | 0.611118 | 0.606469 | 0.825256 | 0.871897 | 0.842511 | 0.840038 | 0.839816 | 0.846499 | 0.843304 | 0.843900 | 0.843595 | 0.844115 | 0.845864 | 0.845610 | -0.410968 | 0.882408 |
| total_amt', Period2022-03_M | 0.413413 | 0.710342 | 0.697890 | 0.718262 | 0.723432 | 0.716229 | 0.718949 | 0.725418 | 0.710101 | 0.715830 | 0.713202 | 0.714785 | 0.728168 | 0.707702 | 0.703950 | 0.718754 | 0.631092 | 0.603902 | 0.585160 | 0.614829 | 0.614694 | 0.621175 | 0.598678 | 0.596428 | 0.618703 | 0.593945 | 0.561752 | 0.627351 | 0.566314 | 0.606820 | 0.613937 | 0.630640 | 0.575848 | 0.592061 | 0.614980 | 0.590814 | 0.634426 | 0.881338 | 0.900810 | 1.000000 | 0.903376 | 0.895180 | 0.847539 | 0.848856 | 0.852346 | 0.858093 | 0.839957 | 0.826637 | 0.337983 | 0.578121 | 0.581321 | 0.585532 | 0.586439 | 0.586210 | 0.589333 | 0.595487 | 0.580015 | 0.585248 | 0.590743 | 0.591763 | 0.591939 | 0.595720 | 0.591221 | 0.587687 | 0.582942 | 0.586049 | 0.594395 | 0.588250 | 0.593187 | 0.597142 | 0.574834 | 0.591694 | 0.595496 | 0.587042 | 0.568729 | 0.590446 | 0.573805 | 0.598556 | 0.593725 | 0.589418 | 0.588579 | 0.599657 | 0.591634 | 0.586675 | 0.585431 | 0.813924 | 0.831407 | 0.850891 | 0.829609 | 0.833267 | 0.835346 | 0.833152 | 0.833228 | 0.828886 | 0.832791 | 0.833495 | 0.836847 | -0.401775 | 0.875140 |
| total_amt', Period2022-04_M | 0.420878 | 0.727539 | 0.707402 | 0.728907 | 0.741517 | 0.738930 | 0.739820 | 0.745081 | 0.731262 | 0.737366 | 0.731437 | 0.740507 | 0.749447 | 0.728251 | 0.730249 | 0.731418 | 0.657191 | 0.630553 | 0.608398 | 0.635527 | 0.630875 | 0.657036 | 0.616873 | 0.619163 | 0.645347 | 0.616883 | 0.581066 | 0.643777 | 0.583734 | 0.624844 | 0.639062 | 0.650572 | 0.594476 | 0.620547 | 0.647098 | 0.606378 | 0.661567 | 0.896079 | 0.904186 | 0.903376 | 1.000000 | 0.906535 | 0.862011 | 0.869383 | 0.858111 | 0.868710 | 0.848555 | 0.818492 | 0.348927 | 0.602247 | 0.601175 | 0.609924 | 0.616415 | 0.614099 | 0.616432 | 0.624001 | 0.611890 | 0.610647 | 0.615798 | 0.623392 | 0.621112 | 0.627100 | 0.620218 | 0.611226 | 0.611938 | 0.615337 | 0.622076 | 0.617659 | 0.617275 | 0.628607 | 0.602951 | 0.625915 | 0.625993 | 0.617008 | 0.598934 | 0.620035 | 0.601691 | 0.623964 | 0.622822 | 0.614901 | 0.613598 | 0.625401 | 0.621812 | 0.614304 | 0.612539 | 0.836083 | 0.841552 | 0.846604 | 0.869563 | 0.850124 | 0.852970 | 0.851385 | 0.849214 | 0.845094 | 0.852597 | 0.847493 | 0.848946 | -0.413467 | 0.884264 |
| total_amt', Period2022-05_M | 0.433582 | 0.730376 | 0.716387 | 0.732597 | 0.742116 | 0.736502 | 0.737101 | 0.744286 | 0.734056 | 0.736075 | 0.733245 | 0.739898 | 0.753157 | 0.733149 | 0.728000 | 0.738159 | 0.669268 | 0.644195 | 0.621053 | 0.645198 | 0.652513 | 0.662651 | 0.624113 | 0.645709 | 0.655855 | 0.635610 | 0.599953 | 0.657956 | 0.612427 | 0.632315 | 0.651142 | 0.666436 | 0.599730 | 0.613804 | 0.650972 | 0.623140 | 0.682089 | 0.894339 | 0.898019 | 0.895180 | 0.906535 | 1.000000 | 0.869486 | 0.873433 | 0.879291 | 0.875752 | 0.854501 | 0.831472 | 0.381591 | 0.631492 | 0.628957 | 0.632642 | 0.635399 | 0.633001 | 0.635399 | 0.644979 | 0.632256 | 0.634265 | 0.637817 | 0.643721 | 0.642331 | 0.645360 | 0.638852 | 0.638075 | 0.633727 | 0.635794 | 0.646584 | 0.636987 | 0.642210 | 0.654656 | 0.620289 | 0.644315 | 0.646757 | 0.635991 | 0.626387 | 0.644692 | 0.628172 | 0.641113 | 0.643251 | 0.636049 | 0.640025 | 0.644344 | 0.641474 | 0.635523 | 0.634962 | 0.858392 | 0.865691 | 0.869395 | 0.867248 | 0.887618 | 0.871093 | 0.871987 | 0.869243 | 0.865247 | 0.867683 | 0.868579 | 0.873182 | -0.410973 | 0.895476 |
| total_amt', Period2022-06_M | 0.408367 | 0.716978 | 0.700699 | 0.720922 | 0.720047 | 0.731702 | 0.720356 | 0.723619 | 0.716172 | 0.723171 | 0.716529 | 0.722863 | 0.734642 | 0.714429 | 0.719419 | 0.715960 | 0.672347 | 0.652644 | 0.645636 | 0.662911 | 0.656614 | 0.686581 | 0.643810 | 0.653662 | 0.682345 | 0.649472 | 0.602978 | 0.677890 | 0.617552 | 0.652936 | 0.666822 | 0.675242 | 0.623365 | 0.633848 | 0.659179 | 0.632374 | 0.690018 | 0.855823 | 0.858006 | 0.847539 | 0.862011 | 0.869486 | 1.000000 | 0.857527 | 0.854271 | 0.861619 | 0.833054 | 0.808981 | 0.370465 | 0.663735 | 0.658202 | 0.666454 | 0.667778 | 0.669674 | 0.667593 | 0.672743 | 0.663650 | 0.668631 | 0.665545 | 0.677579 | 0.675265 | 0.673515 | 0.675140 | 0.665131 | 0.663296 | 0.668375 | 0.677529 | 0.671868 | 0.671867 | 0.683249 | 0.656333 | 0.674232 | 0.675864 | 0.666366 | 0.651256 | 0.679262 | 0.659500 | 0.675735 | 0.680391 | 0.673694 | 0.671988 | 0.675514 | 0.672309 | 0.670243 | 0.668539 | 0.864814 | 0.870410 | 0.873804 | 0.871857 | 0.871964 | 0.892583 | 0.871604 | 0.869112 | 0.868073 | 0.879390 | 0.874152 | 0.873739 | -0.394793 | 0.873737 |
| total_amt', Period2022-07_M | 0.437908 | 0.706839 | 0.696530 | 0.711025 | 0.721130 | 0.715696 | 0.714604 | 0.726857 | 0.715725 | 0.719929 | 0.714106 | 0.722330 | 0.732053 | 0.707699 | 0.716416 | 0.714486 | 0.657171 | 0.645686 | 0.629752 | 0.666026 | 0.652862 | 0.654940 | 0.642746 | 0.637574 | 0.666876 | 0.657766 | 0.605969 | 0.663152 | 0.614708 | 0.644667 | 0.670441 | 0.674190 | 0.616294 | 0.623896 | 0.656405 | 0.634487 | 0.693509 | 0.857600 | 0.859814 | 0.848856 | 0.869383 | 0.873433 | 0.857527 | 1.000000 | 0.860964 | 0.852953 | 0.845685 | 0.799701 | 0.401852 | 0.646107 | 0.651974 | 0.654080 | 0.659429 | 0.653672 | 0.657059 | 0.667579 | 0.656354 | 0.657422 | 0.659840 | 0.666325 | 0.665693 | 0.659552 | 0.662461 | 0.655458 | 0.654970 | 0.662050 | 0.673223 | 0.663065 | 0.664802 | 0.670693 | 0.645587 | 0.665987 | 0.667774 | 0.670563 | 0.649074 | 0.666764 | 0.654658 | 0.672850 | 0.673810 | 0.661167 | 0.659957 | 0.669148 | 0.663814 | 0.657405 | 0.667213 | 0.859580 | 0.869389 | 0.867990 | 0.867408 | 0.870923 | 0.869245 | 0.890620 | 0.869859 | 0.864652 | 0.875819 | 0.863439 | 0.873687 | -0.395597 | 0.877524 |
| total_amt', Period2022-08_M | 0.399066 | 0.712005 | 0.703610 | 0.717676 | 0.720828 | 0.720976 | 0.719936 | 0.726148 | 0.715935 | 0.712678 | 0.716521 | 0.721674 | 0.730632 | 0.707351 | 0.710895 | 0.723762 | 0.667004 | 0.659361 | 0.619612 | 0.652286 | 0.657128 | 0.666492 | 0.630037 | 0.640429 | 0.672354 | 0.655386 | 0.619095 | 0.666305 | 0.634151 | 0.659245 | 0.682974 | 0.681045 | 0.632242 | 0.640626 | 0.665522 | 0.639084 | 0.696125 | 0.858190 | 0.856983 | 0.852346 | 0.858111 | 0.879291 | 0.854271 | 0.860964 | 1.000000 | 0.857891 | 0.837032 | 0.808178 | 0.376166 | 0.665294 | 0.665598 | 0.673192 | 0.669788 | 0.669503 | 0.674201 | 0.683420 | 0.670033 | 0.667713 | 0.667969 | 0.682814 | 0.680643 | 0.676958 | 0.674987 | 0.673833 | 0.668325 | 0.674031 | 0.678393 | 0.673480 | 0.675484 | 0.685453 | 0.663608 | 0.676948 | 0.681382 | 0.673372 | 0.660545 | 0.676221 | 0.669028 | 0.685333 | 0.681682 | 0.678806 | 0.677184 | 0.683037 | 0.678189 | 0.678653 | 0.676513 | 0.873827 | 0.881946 | 0.883200 | 0.885347 | 0.881052 | 0.887219 | 0.885121 | 0.906348 | 0.879271 | 0.885680 | 0.884814 | 0.888905 | -0.404256 | 0.887157 |
| total_amt', Period2022-09_M | 0.419300 | 0.723632 | 0.710236 | 0.725661 | 0.729735 | 0.726998 | 0.725851 | 0.730342 | 0.725794 | 0.726541 | 0.719490 | 0.722827 | 0.740601 | 0.724894 | 0.717045 | 0.734158 | 0.674628 | 0.657596 | 0.638981 | 0.662503 | 0.659793 | 0.664407 | 0.639901 | 0.644054 | 0.660114 | 0.672920 | 0.608305 | 0.667574 | 0.627048 | 0.637699 | 0.668824 | 0.680159 | 0.638999 | 0.640622 | 0.682014 | 0.641317 | 0.697670 | 0.865026 | 0.879038 | 0.858093 | 0.868710 | 0.875752 | 0.861619 | 0.852953 | 0.857891 | 1.000000 | 0.833670 | 0.817470 | 0.378093 | 0.650808 | 0.651992 | 0.655440 | 0.661375 | 0.657373 | 0.657397 | 0.660513 | 0.654826 | 0.657853 | 0.657038 | 0.662240 | 0.664108 | 0.668220 | 0.662093 | 0.660239 | 0.653223 | 0.655469 | 0.669629 | 0.662167 | 0.665591 | 0.667853 | 0.647007 | 0.660919 | 0.665501 | 0.667436 | 0.643732 | 0.663677 | 0.653002 | 0.665010 | 0.666665 | 0.664568 | 0.661227 | 0.670442 | 0.664846 | 0.658287 | 0.658597 | 0.853545 | 0.863684 | 0.859313 | 0.860971 | 0.859493 | 0.868626 | 0.863077 | 0.865410 | 0.881175 | 0.859560 | 0.863772 | 0.866008 | -0.402856 | 0.878777 |
| total_amt', Period2022-10_M | 0.393458 | 0.685961 | 0.670724 | 0.692241 | 0.703767 | 0.699281 | 0.694761 | 0.707365 | 0.694738 | 0.698218 | 0.690752 | 0.701690 | 0.710046 | 0.688702 | 0.691949 | 0.702185 | 0.646981 | 0.637444 | 0.614928 | 0.640796 | 0.632172 | 0.635497 | 0.625242 | 0.643198 | 0.648611 | 0.638459 | 0.602918 | 0.653069 | 0.596953 | 0.635143 | 0.651563 | 0.641775 | 0.595940 | 0.614235 | 0.642610 | 0.611075 | 0.671312 | 0.851335 | 0.845883 | 0.839957 | 0.848555 | 0.854501 | 0.833054 | 0.845685 | 0.837032 | 0.833670 | 1.000000 | 0.795307 | 0.355840 | 0.626397 | 0.620035 | 0.629361 | 0.634533 | 0.633480 | 0.626954 | 0.642347 | 0.629003 | 0.631144 | 0.631191 | 0.645508 | 0.638280 | 0.639239 | 0.635537 | 0.630193 | 0.632313 | 0.634111 | 0.645963 | 0.638019 | 0.634964 | 0.646139 | 0.621966 | 0.643391 | 0.645853 | 0.639584 | 0.621754 | 0.640763 | 0.624286 | 0.646771 | 0.647751 | 0.634153 | 0.634086 | 0.642370 | 0.642102 | 0.637462 | 0.635385 | 0.837831 | 0.840799 | 0.843504 | 0.844079 | 0.841560 | 0.846076 | 0.845524 | 0.844315 | 0.841544 | 0.870270 | 0.842455 | 0.840952 | -0.386815 | 0.855533 |
| total_amt', Period2022-11_M | 0.405663 | 0.683284 | 0.671849 | 0.684701 | 0.687573 | 0.687091 | 0.691536 | 0.697302 | 0.690413 | 0.684080 | 0.684669 | 0.689194 | 0.704329 | 0.687473 | 0.680069 | 0.693752 | 0.627969 | 0.613387 | 0.605353 | 0.640435 | 0.621899 | 0.635092 | 0.609468 | 0.607116 | 0.640331 | 0.621594 | 0.587923 | 0.645942 | 0.606619 | 0.609416 | 0.629076 | 0.643938 | 0.598759 | 0.599072 | 0.625860 | 0.609270 | 0.658042 | 0.823180 | 0.824174 | 0.826637 | 0.818492 | 0.831472 | 0.808981 | 0.799701 | 0.808178 | 0.817470 | 0.795307 | 1.000000 | 0.379678 | 0.620293 | 0.618865 | 0.623068 | 0.624223 | 0.623864 | 0.626933 | 0.632336 | 0.621729 | 0.622780 | 0.629101 | 0.633275 | 0.632905 | 0.635456 | 0.631277 | 0.625703 | 0.620448 | 0.621132 | 0.638052 | 0.628793 | 0.630334 | 0.638941 | 0.610827 | 0.626596 | 0.639989 | 0.633427 | 0.618596 | 0.634508 | 0.630324 | 0.632335 | 0.635841 | 0.627578 | 0.630652 | 0.630006 | 0.633811 | 0.625715 | 0.629749 | 0.812652 | 0.816538 | 0.829292 | 0.813356 | 0.822461 | 0.824874 | 0.819645 | 0.825852 | 0.818578 | 0.821202 | 0.847240 | 0.822226 | -0.398418 | 0.831043 |
| trans_count', Period2018-12_M | 0.758131 | 0.469361 | 0.462963 | 0.468197 | 0.466034 | 0.465663 | 0.468306 | 0.478752 | 0.467535 | 0.484111 | 0.482206 | 0.467490 | 0.472337 | 0.449970 | 0.469047 | 0.464206 | 0.485464 | 0.479081 | 0.472909 | 0.510456 | 0.501250 | 0.490791 | 0.485507 | 0.482933 | 0.513899 | 0.498540 | 0.453495 | 0.497957 | 0.492438 | 0.477867 | 0.492030 | 0.496206 | 0.484118 | 0.501903 | 0.483651 | 0.462200 | 0.498203 | 0.368353 | 0.350528 | 0.337983 | 0.348927 | 0.381591 | 0.370465 | 0.401852 | 0.376166 | 0.378093 | 0.355840 | 0.379678 | 1.000000 | 0.532151 | 0.533577 | 0.541968 | 0.534982 | 0.530538 | 0.531755 | 0.542062 | 0.532554 | 0.546454 | 0.541983 | 0.536613 | 0.538243 | 0.535485 | 0.535265 | 0.534837 | 0.539324 | 0.527816 | 0.543886 | 0.545793 | 0.542361 | 0.542939 | 0.533050 | 0.527431 | 0.547133 | 0.540361 | 0.529033 | 0.546873 | 0.539859 | 0.532083 | 0.544141 | 0.543936 | 0.541859 | 0.533599 | 0.541509 | 0.533113 | 0.547174 | 0.431212 | 0.423790 | 0.410006 | 0.408992 | 0.412935 | 0.416941 | 0.421444 | 0.417321 | 0.416938 | 0.409012 | 0.412859 | 0.414490 | -0.188891 | 0.399576 |
| trans_count', Period2019-01_M | 0.452343 | 0.867305 | 0.821353 | 0.835327 | 0.835619 | 0.841041 | 0.840378 | 0.841287 | 0.843100 | 0.834109 | 0.839741 | 0.822596 | 0.838494 | 0.813013 | 0.805934 | 0.829202 | 0.850917 | 0.864858 | 0.831444 | 0.862340 | 0.867221 | 0.834040 | 0.847965 | 0.827033 | 0.885136 | 0.826927 | 0.810037 | 0.848762 | 0.833457 | 0.821174 | 0.867395 | 0.858029 | 0.803068 | 0.796010 | 0.839599 | 0.797547 | 0.882349 | 0.639137 | 0.594229 | 0.578121 | 0.602247 | 0.631492 | 0.663735 | 0.646107 | 0.665294 | 0.650808 | 0.626397 | 0.620293 | 0.532151 | 1.000000 | 0.946259 | 0.956943 | 0.954025 | 0.959670 | 0.961990 | 0.962040 | 0.963675 | 0.955838 | 0.959507 | 0.959950 | 0.967187 | 0.948273 | 0.952335 | 0.959841 | 0.958016 | 0.958585 | 0.963606 | 0.960584 | 0.962100 | 0.956810 | 0.957884 | 0.955395 | 0.965385 | 0.951658 | 0.952330 | 0.959168 | 0.956031 | 0.954068 | 0.958808 | 0.958951 | 0.960164 | 0.954914 | 0.956624 | 0.957738 | 0.964976 | 0.775488 | 0.735984 | 0.731423 | 0.733094 | 0.746574 | 0.746206 | 0.741239 | 0.742995 | 0.740085 | 0.741331 | 0.732299 | 0.744807 | -0.347326 | 0.692425 |
| trans_count', Period2019-02_M | 0.462406 | 0.827236 | 0.870161 | 0.837044 | 0.840877 | 0.842734 | 0.838535 | 0.841762 | 0.840316 | 0.839286 | 0.842873 | 0.822838 | 0.837416 | 0.818792 | 0.812655 | 0.823799 | 0.849549 | 0.864501 | 0.823604 | 0.862528 | 0.862084 | 0.832155 | 0.845926 | 0.823170 | 0.884840 | 0.822855 | 0.811205 | 0.853107 | 0.832401 | 0.820285 | 0.872866 | 0.857599 | 0.815242 | 0.793907 | 0.844295 | 0.802583 | 0.887027 | 0.632641 | 0.600171 | 0.581321 | 0.601175 | 0.628957 | 0.658202 | 0.651974 | 0.665598 | 0.651992 | 0.620035 | 0.618865 | 0.533577 | 0.946259 | 1.000000 | 0.953475 | 0.954259 | 0.955048 | 0.956958 | 0.958207 | 0.957097 | 0.956091 | 0.957601 | 0.953654 | 0.962462 | 0.949588 | 0.946526 | 0.954489 | 0.956433 | 0.957202 | 0.957761 | 0.958784 | 0.958082 | 0.950765 | 0.952482 | 0.955189 | 0.964405 | 0.949921 | 0.948162 | 0.953344 | 0.951654 | 0.953512 | 0.956888 | 0.957682 | 0.959193 | 0.953113 | 0.954428 | 0.955970 | 0.962359 | 0.769627 | 0.732719 | 0.725674 | 0.729947 | 0.739931 | 0.737556 | 0.736602 | 0.735203 | 0.734834 | 0.733239 | 0.727435 | 0.737034 | -0.343110 | 0.687360 |
| trans_count', Period2019-03_M | 0.458912 | 0.834492 | 0.830100 | 0.870714 | 0.843371 | 0.847101 | 0.846845 | 0.848015 | 0.848133 | 0.843196 | 0.847223 | 0.830398 | 0.840928 | 0.813338 | 0.808823 | 0.825855 | 0.852736 | 0.867290 | 0.827087 | 0.872208 | 0.872678 | 0.843911 | 0.852320 | 0.839549 | 0.884587 | 0.828691 | 0.816737 | 0.853374 | 0.838989 | 0.834354 | 0.875990 | 0.856516 | 0.814943 | 0.807301 | 0.848654 | 0.797377 | 0.891025 | 0.647443 | 0.604270 | 0.585532 | 0.609924 | 0.632642 | 0.666454 | 0.654080 | 0.673192 | 0.655440 | 0.629361 | 0.623068 | 0.541968 | 0.956943 | 0.953475 | 1.000000 | 0.962186 | 0.966485 | 0.969900 | 0.967576 | 0.970505 | 0.965686 | 0.968131 | 0.965925 | 0.973813 | 0.959636 | 0.956855 | 0.964288 | 0.967789 | 0.968945 | 0.967260 | 0.967740 | 0.969285 | 0.964522 | 0.963488 | 0.964573 | 0.971300 | 0.957540 | 0.956810 | 0.965124 | 0.963648 | 0.966080 | 0.967744 | 0.966647 | 0.967415 | 0.963395 | 0.965989 | 0.963370 | 0.973897 | 0.780958 | 0.741742 | 0.736975 | 0.741081 | 0.749054 | 0.750041 | 0.745717 | 0.747340 | 0.743815 | 0.744029 | 0.739627 | 0.749861 | -0.357593 | 0.696378 |
| trans_count', Period2019-04_M | 0.468279 | 0.837639 | 0.834151 | 0.844074 | 0.874956 | 0.849373 | 0.845405 | 0.847243 | 0.846596 | 0.840499 | 0.846124 | 0.831155 | 0.842828 | 0.817815 | 0.812738 | 0.831412 | 0.860531 | 0.869178 | 0.832778 | 0.872781 | 0.871187 | 0.842371 | 0.853066 | 0.838775 | 0.890986 | 0.828320 | 0.817614 | 0.854308 | 0.844775 | 0.829801 | 0.879482 | 0.858813 | 0.813133 | 0.795793 | 0.848252 | 0.796717 | 0.890029 | 0.648542 | 0.610641 | 0.586439 | 0.616415 | 0.635399 | 0.667778 | 0.659429 | 0.669788 | 0.661375 | 0.634533 | 0.624223 | 0.534982 | 0.954025 | 0.954259 | 0.962186 | 1.000000 | 0.965587 | 0.966684 | 0.965855 | 0.966840 | 0.960908 | 0.966326 | 0.963155 | 0.971309 | 0.955482 | 0.955960 | 0.964520 | 0.964024 | 0.963009 | 0.966872 | 0.965654 | 0.966105 | 0.960219 | 0.961558 | 0.962062 | 0.971439 | 0.956711 | 0.953235 | 0.961115 | 0.962026 | 0.962135 | 0.964745 | 0.963313 | 0.965198 | 0.960395 | 0.962532 | 0.959376 | 0.968034 | 0.780228 | 0.745064 | 0.738785 | 0.740690 | 0.753052 | 0.750198 | 0.747987 | 0.747760 | 0.746225 | 0.747917 | 0.741938 | 0.750305 | -0.347712 | 0.698029 |
| trans_count', Period2019-05_M | 0.454956 | 0.832431 | 0.828582 | 0.842022 | 0.845523 | 0.873533 | 0.844163 | 0.845775 | 0.845386 | 0.841219 | 0.846933 | 0.828212 | 0.840135 | 0.813124 | 0.811102 | 0.826711 | 0.851263 | 0.863429 | 0.835224 | 0.860464 | 0.869819 | 0.845875 | 0.856400 | 0.836375 | 0.887294 | 0.828446 | 0.809714 | 0.848513 | 0.840916 | 0.825002 | 0.873069 | 0.859492 | 0.812509 | 0.800381 | 0.849656 | 0.798538 | 0.890360 | 0.644262 | 0.607653 | 0.586210 | 0.614099 | 0.633001 | 0.669674 | 0.653672 | 0.669503 | 0.657373 | 0.633480 | 0.623864 | 0.530538 | 0.959670 | 0.955048 | 0.966485 | 0.965587 | 1.000000 | 0.968830 | 0.969324 | 0.970262 | 0.966035 | 0.968919 | 0.965026 | 0.975987 | 0.957584 | 0.957696 | 0.963518 | 0.965749 | 0.967295 | 0.967208 | 0.965886 | 0.970198 | 0.965274 | 0.966138 | 0.963385 | 0.972590 | 0.958093 | 0.957228 | 0.961153 | 0.962342 | 0.965299 | 0.966790 | 0.967326 | 0.967738 | 0.963388 | 0.965503 | 0.964357 | 0.971579 | 0.783582 | 0.746225 | 0.740327 | 0.743999 | 0.754230 | 0.752125 | 0.750236 | 0.750630 | 0.747060 | 0.747656 | 0.740346 | 0.752428 | -0.351065 | 0.699534 |
| trans_count', Period2019-06_M | 0.462828 | 0.842734 | 0.838449 | 0.852381 | 0.852257 | 0.854790 | 0.874446 | 0.857282 | 0.857034 | 0.849991 | 0.854744 | 0.839350 | 0.850908 | 0.825166 | 0.819685 | 0.834914 | 0.862802 | 0.874092 | 0.834572 | 0.875285 | 0.874352 | 0.850917 | 0.857115 | 0.845160 | 0.891971 | 0.831176 | 0.819773 | 0.860435 | 0.843816 | 0.835081 | 0.880029 | 0.866968 | 0.818008 | 0.812641 | 0.845425 | 0.801744 | 0.897324 | 0.649411 | 0.608878 | 0.589333 | 0.616432 | 0.635399 | 0.667593 | 0.657059 | 0.674201 | 0.657397 | 0.626954 | 0.626933 | 0.531755 | 0.961990 | 0.956958 | 0.969900 | 0.966684 | 0.968830 | 1.000000 | 0.972443 | 0.973355 | 0.968201 | 0.969121 | 0.970365 | 0.976758 | 0.959636 | 0.959296 | 0.967160 | 0.968969 | 0.968955 | 0.972159 | 0.969935 | 0.970373 | 0.966377 | 0.966962 | 0.966314 | 0.974296 | 0.957975 | 0.958327 | 0.966465 | 0.961785 | 0.965365 | 0.968006 | 0.968356 | 0.970575 | 0.964871 | 0.967798 | 0.964403 | 0.973892 | 0.779776 | 0.742189 | 0.737970 | 0.740574 | 0.751734 | 0.750194 | 0.745826 | 0.747912 | 0.744242 | 0.744891 | 0.739755 | 0.748385 | -0.354487 | 0.697402 |
| trans_count', Period2019-07_M | 0.464256 | 0.842286 | 0.836692 | 0.850870 | 0.850693 | 0.853106 | 0.852801 | 0.876209 | 0.853536 | 0.847656 | 0.852597 | 0.838255 | 0.849921 | 0.820867 | 0.819231 | 0.836654 | 0.860017 | 0.876971 | 0.832209 | 0.867922 | 0.878364 | 0.850918 | 0.862387 | 0.843069 | 0.892698 | 0.833155 | 0.819775 | 0.858640 | 0.844805 | 0.831893 | 0.879515 | 0.867560 | 0.814901 | 0.807560 | 0.852940 | 0.803974 | 0.894151 | 0.652859 | 0.612684 | 0.595487 | 0.624001 | 0.644979 | 0.672743 | 0.667579 | 0.683420 | 0.660513 | 0.642347 | 0.632336 | 0.542062 | 0.962040 | 0.958207 | 0.967576 | 0.965855 | 0.969324 | 0.972443 | 1.000000 | 0.970268 | 0.967095 | 0.968110 | 0.968919 | 0.977139 | 0.960888 | 0.959602 | 0.969290 | 0.968491 | 0.970831 | 0.971742 | 0.970058 | 0.970083 | 0.967152 | 0.967789 | 0.968827 | 0.974754 | 0.958479 | 0.959338 | 0.966234 | 0.964588 | 0.967582 | 0.968223 | 0.968941 | 0.969618 | 0.964286 | 0.968400 | 0.963448 | 0.974856 | 0.787370 | 0.752561 | 0.745791 | 0.749167 | 0.759328 | 0.758590 | 0.754874 | 0.755431 | 0.752706 | 0.754414 | 0.744880 | 0.757298 | -0.352189 | 0.706792 |
| trans_count', Period2019-08_M | 0.456326 | 0.843911 | 0.833723 | 0.849431 | 0.849340 | 0.854924 | 0.851066 | 0.850850 | 0.872691 | 0.848204 | 0.852617 | 0.836124 | 0.847840 | 0.822599 | 0.818147 | 0.834789 | 0.858073 | 0.869166 | 0.835806 | 0.873284 | 0.874050 | 0.846457 | 0.857710 | 0.835894 | 0.892099 | 0.831994 | 0.827147 | 0.857949 | 0.845663 | 0.834351 | 0.882775 | 0.868081 | 0.815709 | 0.803491 | 0.850754 | 0.800960 | 0.898391 | 0.647894 | 0.604810 | 0.580015 | 0.611890 | 0.632256 | 0.663650 | 0.656354 | 0.670033 | 0.654826 | 0.629003 | 0.621729 | 0.532554 | 0.963675 | 0.957097 | 0.970505 | 0.966840 | 0.970262 | 0.973355 | 0.970268 | 1.000000 | 0.970898 | 0.969888 | 0.970644 | 0.978208 | 0.964473 | 0.961022 | 0.969049 | 0.968013 | 0.969957 | 0.971117 | 0.971865 | 0.971139 | 0.969219 | 0.968834 | 0.968869 | 0.976053 | 0.960800 | 0.961154 | 0.967372 | 0.965271 | 0.968344 | 0.970177 | 0.971664 | 0.970743 | 0.967188 | 0.967040 | 0.964941 | 0.975982 | 0.782188 | 0.741296 | 0.734382 | 0.740281 | 0.749740 | 0.747682 | 0.746185 | 0.745657 | 0.743632 | 0.744906 | 0.737392 | 0.746790 | -0.349196 | 0.694985 |
| trans_count', Period2019-09_M | 0.470820 | 0.836527 | 0.835141 | 0.848160 | 0.846332 | 0.850055 | 0.848782 | 0.850547 | 0.852616 | 0.872030 | 0.852242 | 0.834853 | 0.844658 | 0.819227 | 0.818629 | 0.831284 | 0.855195 | 0.866643 | 0.834563 | 0.870173 | 0.868886 | 0.848898 | 0.863365 | 0.843875 | 0.889714 | 0.836803 | 0.818336 | 0.854747 | 0.845130 | 0.830942 | 0.875472 | 0.864561 | 0.813548 | 0.802012 | 0.851162 | 0.802815 | 0.891175 | 0.648796 | 0.607961 | 0.585248 | 0.610647 | 0.634265 | 0.668631 | 0.657422 | 0.667713 | 0.657853 | 0.631144 | 0.622780 | 0.546454 | 0.955838 | 0.956091 | 0.965686 | 0.960908 | 0.966035 | 0.968201 | 0.967095 | 0.970898 | 1.000000 | 0.967591 | 0.963422 | 0.971991 | 0.958227 | 0.960281 | 0.965292 | 0.963815 | 0.964819 | 0.965588 | 0.967853 | 0.967188 | 0.966818 | 0.964603 | 0.963260 | 0.972173 | 0.959070 | 0.953509 | 0.962640 | 0.963302 | 0.962944 | 0.965442 | 0.969243 | 0.967643 | 0.961759 | 0.965315 | 0.963436 | 0.970957 | 0.781901 | 0.742125 | 0.735249 | 0.739738 | 0.747635 | 0.747517 | 0.745144 | 0.745429 | 0.743720 | 0.742906 | 0.736241 | 0.747488 | -0.351742 | 0.699467 |
| trans_count', Period2019-10_M | 0.459396 | 0.843813 | 0.837570 | 0.852769 | 0.852477 | 0.856715 | 0.851791 | 0.853757 | 0.852823 | 0.851707 | 0.875611 | 0.835944 | 0.852335 | 0.823674 | 0.819330 | 0.834757 | 0.855024 | 0.865995 | 0.836785 | 0.872540 | 0.876637 | 0.846841 | 0.860262 | 0.838072 | 0.896240 | 0.831566 | 0.814980 | 0.852611 | 0.845672 | 0.823723 | 0.877032 | 0.862307 | 0.811621 | 0.800364 | 0.844691 | 0.805998 | 0.895028 | 0.649143 | 0.609502 | 0.590743 | 0.615798 | 0.637817 | 0.665545 | 0.659840 | 0.667969 | 0.657038 | 0.631191 | 0.629101 | 0.541983 | 0.959507 | 0.957601 | 0.968131 | 0.966326 | 0.968919 | 0.969121 | 0.968110 | 0.969888 | 0.967591 | 1.000000 | 0.966015 | 0.975437 | 0.963051 | 0.958820 | 0.966804 | 0.965408 | 0.966243 | 0.969435 | 0.968722 | 0.970144 | 0.966738 | 0.964837 | 0.965703 | 0.975214 | 0.956953 | 0.954982 | 0.965155 | 0.962992 | 0.962608 | 0.965642 | 0.966136 | 0.967799 | 0.961547 | 0.966302 | 0.963415 | 0.971664 | 0.780135 | 0.741882 | 0.739841 | 0.740984 | 0.752118 | 0.751737 | 0.747354 | 0.749050 | 0.743306 | 0.746829 | 0.741341 | 0.749673 | -0.346931 | 0.698112 |
| trans_count', Period2019-11_M | 0.458764 | 0.847733 | 0.837294 | 0.854246 | 0.855461 | 0.857707 | 0.857041 | 0.859127 | 0.858899 | 0.850842 | 0.856160 | 0.867190 | 0.854351 | 0.824633 | 0.826910 | 0.839552 | 0.864256 | 0.873185 | 0.836454 | 0.875436 | 0.877776 | 0.850451 | 0.865578 | 0.842200 | 0.894014 | 0.826846 | 0.818779 | 0.858948 | 0.851205 | 0.832225 | 0.882089 | 0.868110 | 0.808976 | 0.813086 | 0.849273 | 0.804020 | 0.895872 | 0.656003 | 0.618120 | 0.591763 | 0.623392 | 0.643721 | 0.677579 | 0.666325 | 0.682814 | 0.662240 | 0.645508 | 0.633275 | 0.536613 | 0.959950 | 0.953654 | 0.965925 | 0.963155 | 0.965026 | 0.970365 | 0.968919 | 0.970644 | 0.963422 | 0.966015 | 1.000000 | 0.973961 | 0.956213 | 0.958580 | 0.965204 | 0.964218 | 0.966506 | 0.970872 | 0.968410 | 0.965734 | 0.965667 | 0.963942 | 0.964157 | 0.971979 | 0.956454 | 0.956478 | 0.966038 | 0.963665 | 0.962288 | 0.965855 | 0.968151 | 0.966881 | 0.962357 | 0.964773 | 0.962756 | 0.972521 | 0.784834 | 0.745240 | 0.740896 | 0.746418 | 0.753251 | 0.753602 | 0.749813 | 0.750523 | 0.744999 | 0.752355 | 0.741925 | 0.752143 | -0.347030 | 0.703649 |
| trans_count', Period2019-12_M | 0.461434 | 0.845453 | 0.837470 | 0.852151 | 0.852981 | 0.858691 | 0.854831 | 0.857003 | 0.856632 | 0.849773 | 0.855043 | 0.841385 | 0.864487 | 0.824816 | 0.823534 | 0.835840 | 0.855792 | 0.877885 | 0.837656 | 0.870269 | 0.877918 | 0.847964 | 0.859502 | 0.841384 | 0.896682 | 0.836362 | 0.819606 | 0.859462 | 0.845918 | 0.835571 | 0.883489 | 0.870231 | 0.813579 | 0.808460 | 0.852317 | 0.806325 | 0.897528 | 0.654720 | 0.612870 | 0.591939 | 0.621112 | 0.642331 | 0.675265 | 0.665693 | 0.680643 | 0.664108 | 0.638280 | 0.632905 | 0.538243 | 0.967187 | 0.962462 | 0.973813 | 0.971309 | 0.975987 | 0.976758 | 0.977139 | 0.978208 | 0.971991 | 0.975437 | 0.973961 | 1.000000 | 0.965714 | 0.966529 | 0.972510 | 0.971923 | 0.973151 | 0.976218 | 0.975463 | 0.978033 | 0.970852 | 0.973590 | 0.973751 | 0.980773 | 0.965804 | 0.965623 | 0.971334 | 0.970523 | 0.973583 | 0.975204 | 0.976127 | 0.975600 | 0.971847 | 0.972297 | 0.971675 | 0.979625 | 0.791597 | 0.752245 | 0.747135 | 0.752619 | 0.761707 | 0.761485 | 0.757786 | 0.759768 | 0.756140 | 0.757085 | 0.749638 | 0.760988 | -0.365127 | 0.703665 |
| trans_count', Period2020-01_M | 0.470286 | 0.835428 | 0.841366 | 0.848167 | 0.850313 | 0.853196 | 0.847633 | 0.851245 | 0.854692 | 0.845091 | 0.854185 | 0.833898 | 0.849414 | 0.856438 | 0.821145 | 0.832837 | 0.854218 | 0.869979 | 0.844153 | 0.870584 | 0.876997 | 0.851757 | 0.861607 | 0.839261 | 0.891403 | 0.833559 | 0.817766 | 0.861288 | 0.844403 | 0.829002 | 0.885047 | 0.865490 | 0.827901 | 0.809180 | 0.852969 | 0.797452 | 0.892282 | 0.653893 | 0.613016 | 0.595720 | 0.627100 | 0.645360 | 0.673515 | 0.659552 | 0.676958 | 0.668220 | 0.639239 | 0.635456 | 0.535485 | 0.948273 | 0.949588 | 0.959636 | 0.955482 | 0.957584 | 0.959636 | 0.960888 | 0.964473 | 0.958227 | 0.963051 | 0.956213 | 0.965714 | 1.000000 | 0.950697 | 0.957563 | 0.957966 | 0.958872 | 0.964039 | 0.961017 | 0.962486 | 0.959116 | 0.958571 | 0.959382 | 0.964691 | 0.952411 | 0.949695 | 0.958699 | 0.955954 | 0.954790 | 0.963460 | 0.960988 | 0.962955 | 0.956199 | 0.962035 | 0.954551 | 0.967092 | 0.780749 | 0.743570 | 0.736706 | 0.740364 | 0.749660 | 0.747283 | 0.746232 | 0.747268 | 0.744572 | 0.744963 | 0.736640 | 0.747145 | -0.346319 | 0.703251 |
| trans_count', Period2020-02_M | 0.475649 | 0.840678 | 0.830363 | 0.844247 | 0.844559 | 0.848352 | 0.848464 | 0.848025 | 0.850126 | 0.843241 | 0.847986 | 0.836569 | 0.845170 | 0.818418 | 0.859851 | 0.833949 | 0.858991 | 0.873607 | 0.831936 | 0.871446 | 0.865883 | 0.846598 | 0.856971 | 0.832610 | 0.883090 | 0.828894 | 0.811999 | 0.855663 | 0.845624 | 0.824469 | 0.878974 | 0.860657 | 0.809988 | 0.801170 | 0.843693 | 0.803746 | 0.891653 | 0.654266 | 0.612110 | 0.591221 | 0.620218 | 0.638852 | 0.675140 | 0.662461 | 0.674987 | 0.662093 | 0.635537 | 0.631277 | 0.535265 | 0.952335 | 0.946526 | 0.956855 | 0.955960 | 0.957696 | 0.959296 | 0.959602 | 0.961022 | 0.960281 | 0.958820 | 0.958580 | 0.966529 | 0.950697 | 1.000000 | 0.957922 | 0.959820 | 0.962377 | 0.962160 | 0.960021 | 0.960126 | 0.958401 | 0.956198 | 0.956092 | 0.964238 | 0.951608 | 0.951270 | 0.962043 | 0.958085 | 0.961246 | 0.963273 | 0.961986 | 0.962092 | 0.956747 | 0.958393 | 0.959116 | 0.965633 | 0.778878 | 0.743161 | 0.737095 | 0.741507 | 0.749530 | 0.751023 | 0.744692 | 0.745271 | 0.745141 | 0.748538 | 0.738885 | 0.748008 | -0.339621 | 0.696683 |
| trans_count', Period2020-03_M | 0.457420 | 0.838266 | 0.831369 | 0.847701 | 0.850761 | 0.849106 | 0.846618 | 0.852657 | 0.850201 | 0.842754 | 0.848301 | 0.833486 | 0.846034 | 0.819095 | 0.816803 | 0.859817 | 0.855590 | 0.873574 | 0.835972 | 0.867747 | 0.875042 | 0.847050 | 0.853675 | 0.840949 | 0.893379 | 0.836179 | 0.824670 | 0.858192 | 0.846180 | 0.834639 | 0.886203 | 0.864568 | 0.813509 | 0.806146 | 0.847782 | 0.803126 | 0.893022 | 0.647110 | 0.605830 | 0.587687 | 0.611226 | 0.638075 | 0.665131 | 0.655458 | 0.673833 | 0.660239 | 0.630193 | 0.625703 | 0.534837 | 0.959841 | 0.954489 | 0.964288 | 0.964520 | 0.963518 | 0.967160 | 0.969290 | 0.969049 | 0.965292 | 0.966804 | 0.965204 | 0.972510 | 0.957563 | 0.957922 | 1.000000 | 0.966154 | 0.966344 | 0.968227 | 0.966456 | 0.968157 | 0.963635 | 0.963901 | 0.963117 | 0.972424 | 0.958580 | 0.956884 | 0.964395 | 0.964118 | 0.964176 | 0.968115 | 0.966779 | 0.970027 | 0.961727 | 0.962930 | 0.964579 | 0.972924 | 0.778698 | 0.739112 | 0.733378 | 0.736065 | 0.747440 | 0.744063 | 0.741180 | 0.744439 | 0.740225 | 0.741849 | 0.735997 | 0.745461 | -0.358257 | 0.698150 |
| trans_count', Period2020-04_M | 0.453956 | 0.831728 | 0.826092 | 0.845664 | 0.842492 | 0.845087 | 0.842380 | 0.844680 | 0.845209 | 0.837955 | 0.841783 | 0.827109 | 0.841669 | 0.812018 | 0.817136 | 0.829212 | 0.888219 | 0.878455 | 0.835339 | 0.872674 | 0.871847 | 0.849668 | 0.858267 | 0.834632 | 0.893720 | 0.832967 | 0.817253 | 0.860176 | 0.846650 | 0.839882 | 0.884154 | 0.864139 | 0.808143 | 0.801620 | 0.853793 | 0.806219 | 0.887143 | 0.643662 | 0.598896 | 0.582942 | 0.611938 | 0.633727 | 0.663296 | 0.654970 | 0.668325 | 0.653223 | 0.632313 | 0.620448 | 0.539324 | 0.958016 | 0.956433 | 0.967789 | 0.964024 | 0.965749 | 0.968969 | 0.968491 | 0.968013 | 0.963815 | 0.965408 | 0.964218 | 0.971923 | 0.957966 | 0.959820 | 0.966154 | 1.000000 | 0.970217 | 0.969057 | 0.969617 | 0.967957 | 0.963886 | 0.964523 | 0.964749 | 0.973288 | 0.956744 | 0.959650 | 0.965385 | 0.964611 | 0.967384 | 0.967328 | 0.966789 | 0.970366 | 0.962557 | 0.967412 | 0.964423 | 0.972256 | 0.776766 | 0.736384 | 0.730123 | 0.737487 | 0.748209 | 0.745056 | 0.740429 | 0.741835 | 0.738688 | 0.741348 | 0.735272 | 0.743318 | -0.338082 | 0.695663 |
| trans_count', Period2020-05_M | 0.449220 | 0.832231 | 0.831807 | 0.844811 | 0.843618 | 0.846251 | 0.845728 | 0.847354 | 0.845661 | 0.840443 | 0.844232 | 0.829316 | 0.841110 | 0.815993 | 0.817325 | 0.829420 | 0.856457 | 0.893951 | 0.832715 | 0.870352 | 0.872896 | 0.842475 | 0.856794 | 0.838791 | 0.886409 | 0.834465 | 0.813023 | 0.854299 | 0.850693 | 0.835133 | 0.885528 | 0.868226 | 0.810175 | 0.812169 | 0.854646 | 0.802144 | 0.891476 | 0.647402 | 0.607639 | 0.586049 | 0.615337 | 0.635794 | 0.668375 | 0.662050 | 0.674031 | 0.655469 | 0.634111 | 0.621132 | 0.527816 | 0.958585 | 0.957202 | 0.968945 | 0.963009 | 0.967295 | 0.968955 | 0.970831 | 0.969957 | 0.964819 | 0.966243 | 0.966506 | 0.973151 | 0.958872 | 0.962377 | 0.966344 | 0.970217 | 1.000000 | 0.971519 | 0.968060 | 0.969905 | 0.963103 | 0.969427 | 0.965258 | 0.973261 | 0.962352 | 0.958761 | 0.963938 | 0.967475 | 0.967042 | 0.972031 | 0.971233 | 0.970944 | 0.964507 | 0.968716 | 0.965942 | 0.976048 | 0.784279 | 0.745808 | 0.740189 | 0.744327 | 0.754730 | 0.753696 | 0.749451 | 0.747893 | 0.747222 | 0.749482 | 0.740257 | 0.753715 | -0.365077 | 0.704129 |
| trans_count', Period2020-06_M | 0.464791 | 0.843920 | 0.837182 | 0.850482 | 0.851377 | 0.854238 | 0.856488 | 0.857067 | 0.856239 | 0.848818 | 0.853699 | 0.838042 | 0.853370 | 0.829243 | 0.825514 | 0.840067 | 0.863245 | 0.877946 | 0.866434 | 0.878817 | 0.878615 | 0.854636 | 0.862192 | 0.849870 | 0.891179 | 0.842782 | 0.820866 | 0.862440 | 0.851523 | 0.841168 | 0.889857 | 0.869002 | 0.821607 | 0.810846 | 0.858662 | 0.806326 | 0.901088 | 0.657238 | 0.614022 | 0.594395 | 0.622076 | 0.646584 | 0.677529 | 0.673223 | 0.678393 | 0.669629 | 0.645963 | 0.638052 | 0.543886 | 0.963606 | 0.957761 | 0.967260 | 0.966872 | 0.967208 | 0.972159 | 0.971742 | 0.971117 | 0.965588 | 0.969435 | 0.970872 | 0.976218 | 0.964039 | 0.962160 | 0.968227 | 0.969057 | 0.971519 | 1.000000 | 0.972788 | 0.971405 | 0.968057 | 0.967551 | 0.968817 | 0.974926 | 0.963245 | 0.961454 | 0.968345 | 0.968492 | 0.966830 | 0.973206 | 0.970131 | 0.971464 | 0.966081 | 0.970614 | 0.966338 | 0.976460 | 0.785753 | 0.748882 | 0.743980 | 0.747076 | 0.759274 | 0.758108 | 0.753794 | 0.754195 | 0.750486 | 0.755922 | 0.745867 | 0.757086 | -0.348231 | 0.710393 |
| trans_count', Period2020-07_M | 0.470284 | 0.840544 | 0.835129 | 0.848804 | 0.849786 | 0.853096 | 0.851485 | 0.853267 | 0.853515 | 0.848264 | 0.850074 | 0.837108 | 0.850083 | 0.820633 | 0.823889 | 0.832099 | 0.861838 | 0.871269 | 0.842639 | 0.901283 | 0.877998 | 0.851212 | 0.864724 | 0.843759 | 0.893655 | 0.835311 | 0.824153 | 0.858998 | 0.846847 | 0.833140 | 0.883911 | 0.870599 | 0.817686 | 0.805930 | 0.855844 | 0.803352 | 0.894689 | 0.650441 | 0.614259 | 0.588250 | 0.617659 | 0.636987 | 0.671868 | 0.663065 | 0.673480 | 0.662167 | 0.638019 | 0.628793 | 0.545793 | 0.960584 | 0.958784 | 0.967740 | 0.965654 | 0.965886 | 0.969935 | 0.970058 | 0.971865 | 0.967853 | 0.968722 | 0.968410 | 0.975463 | 0.961017 | 0.960021 | 0.966456 | 0.969617 | 0.968060 | 0.972788 | 1.000000 | 0.971844 | 0.966475 | 0.967544 | 0.966975 | 0.975769 | 0.960589 | 0.960040 | 0.967478 | 0.964943 | 0.966433 | 0.971137 | 0.969552 | 0.971748 | 0.964427 | 0.968909 | 0.966044 | 0.976628 | 0.782775 | 0.742504 | 0.737320 | 0.742025 | 0.751692 | 0.750248 | 0.744893 | 0.746855 | 0.744123 | 0.749437 | 0.739283 | 0.748059 | -0.343487 | 0.703322 |
| trans_count', Period2020-08_M | 0.468283 | 0.832978 | 0.827187 | 0.844382 | 0.844117 | 0.846604 | 0.844402 | 0.845227 | 0.845346 | 0.842051 | 0.845952 | 0.827290 | 0.841418 | 0.814795 | 0.813063 | 0.827833 | 0.855086 | 0.867331 | 0.837935 | 0.867992 | 0.900458 | 0.845916 | 0.856583 | 0.839161 | 0.890150 | 0.840271 | 0.814214 | 0.854820 | 0.849836 | 0.829678 | 0.882278 | 0.865589 | 0.815161 | 0.800896 | 0.850594 | 0.797764 | 0.895363 | 0.655151 | 0.612405 | 0.593187 | 0.617275 | 0.642210 | 0.671867 | 0.664802 | 0.675484 | 0.665591 | 0.634964 | 0.630334 | 0.542361 | 0.962100 | 0.958082 | 0.969285 | 0.966105 | 0.970198 | 0.970373 | 0.970083 | 0.971139 | 0.967188 | 0.970144 | 0.965734 | 0.978033 | 0.962486 | 0.960126 | 0.968157 | 0.967957 | 0.969905 | 0.971405 | 0.971844 | 1.000000 | 0.968423 | 0.966864 | 0.968526 | 0.976906 | 0.966084 | 0.960967 | 0.968531 | 0.968172 | 0.966728 | 0.971284 | 0.971935 | 0.972227 | 0.968995 | 0.968149 | 0.965690 | 0.976594 | 0.790424 | 0.749905 | 0.746597 | 0.748057 | 0.760513 | 0.756870 | 0.754907 | 0.755740 | 0.751406 | 0.753604 | 0.747308 | 0.756881 | -0.367185 | 0.703759 |
| trans_count', Period2020-09_M | 0.471719 | 0.843401 | 0.833601 | 0.853677 | 0.852200 | 0.858116 | 0.851387 | 0.856910 | 0.856603 | 0.853002 | 0.857104 | 0.843653 | 0.852952 | 0.827514 | 0.823819 | 0.839180 | 0.860032 | 0.866935 | 0.832636 | 0.872829 | 0.879377 | 0.882716 | 0.866177 | 0.849412 | 0.896164 | 0.837671 | 0.820814 | 0.864524 | 0.848851 | 0.826085 | 0.885690 | 0.873158 | 0.819689 | 0.809093 | 0.854147 | 0.800672 | 0.896496 | 0.661767 | 0.623802 | 0.597142 | 0.628607 | 0.654656 | 0.683249 | 0.670693 | 0.685453 | 0.667853 | 0.646139 | 0.638941 | 0.542939 | 0.956810 | 0.950765 | 0.964522 | 0.960219 | 0.965274 | 0.966377 | 0.967152 | 0.969219 | 0.966818 | 0.966738 | 0.965667 | 0.970852 | 0.959116 | 0.958401 | 0.963635 | 0.963886 | 0.963103 | 0.968057 | 0.966475 | 0.968423 | 1.000000 | 0.963245 | 0.965956 | 0.971939 | 0.957493 | 0.957013 | 0.967219 | 0.966902 | 0.963072 | 0.969581 | 0.969082 | 0.968515 | 0.964626 | 0.965520 | 0.961819 | 0.973261 | 0.787757 | 0.749815 | 0.745030 | 0.748167 | 0.759267 | 0.756972 | 0.755099 | 0.753857 | 0.749845 | 0.753261 | 0.746290 | 0.755251 | -0.357035 | 0.707639 |
| trans_count', Period2020-10_M | 0.454737 | 0.831950 | 0.825370 | 0.837464 | 0.839617 | 0.844983 | 0.841978 | 0.842750 | 0.841874 | 0.839597 | 0.841540 | 0.824964 | 0.838939 | 0.811494 | 0.808763 | 0.823068 | 0.853030 | 0.865063 | 0.831023 | 0.864773 | 0.867329 | 0.841136 | 0.891589 | 0.830690 | 0.891328 | 0.827259 | 0.811606 | 0.848891 | 0.841230 | 0.828289 | 0.878975 | 0.867138 | 0.808631 | 0.812074 | 0.851381 | 0.801389 | 0.888408 | 0.636302 | 0.594838 | 0.574834 | 0.602951 | 0.620289 | 0.656333 | 0.645587 | 0.663608 | 0.647007 | 0.621966 | 0.610827 | 0.533050 | 0.957884 | 0.952482 | 0.963488 | 0.961558 | 0.966138 | 0.966962 | 0.967789 | 0.968834 | 0.964603 | 0.964837 | 0.963942 | 0.973590 | 0.958571 | 0.956198 | 0.963901 | 0.964523 | 0.969427 | 0.967551 | 0.967544 | 0.966864 | 0.963245 | 1.000000 | 0.962026 | 0.972111 | 0.957642 | 0.958227 | 0.964119 | 0.964959 | 0.962829 | 0.967311 | 0.969434 | 0.968051 | 0.964721 | 0.965267 | 0.964728 | 0.972266 | 0.771896 | 0.732712 | 0.728545 | 0.734611 | 0.741723 | 0.741505 | 0.737114 | 0.739946 | 0.736533 | 0.736975 | 0.730454 | 0.741615 | -0.348091 | 0.688113 |
| trans_count', Period2020-11_M | 0.459383 | 0.838680 | 0.834284 | 0.849299 | 0.848311 | 0.852801 | 0.850541 | 0.854140 | 0.855877 | 0.845751 | 0.851632 | 0.835540 | 0.850622 | 0.822874 | 0.819418 | 0.831793 | 0.859128 | 0.874658 | 0.836407 | 0.871239 | 0.877149 | 0.843593 | 0.856815 | 0.876984 | 0.893927 | 0.834488 | 0.815329 | 0.855297 | 0.847128 | 0.835415 | 0.883382 | 0.869573 | 0.812972 | 0.804639 | 0.856394 | 0.800262 | 0.896863 | 0.654463 | 0.614687 | 0.591694 | 0.625915 | 0.644315 | 0.674232 | 0.665987 | 0.676948 | 0.660919 | 0.643391 | 0.626596 | 0.527431 | 0.955395 | 0.955189 | 0.964573 | 0.962062 | 0.963385 | 0.966314 | 0.968827 | 0.968869 | 0.963260 | 0.965703 | 0.964157 | 0.973751 | 0.959382 | 0.956092 | 0.963117 | 0.964749 | 0.965258 | 0.968817 | 0.966975 | 0.968526 | 0.965956 | 0.962026 | 1.000000 | 0.971427 | 0.957782 | 0.956579 | 0.964056 | 0.962994 | 0.963997 | 0.967259 | 0.965533 | 0.967466 | 0.962411 | 0.966568 | 0.962174 | 0.973341 | 0.786643 | 0.745893 | 0.739723 | 0.745997 | 0.756297 | 0.752521 | 0.751304 | 0.749183 | 0.745365 | 0.751414 | 0.740298 | 0.752085 | -0.350098 | 0.704729 |
| trans_count', Period2020-12_M | 0.472859 | 0.850894 | 0.843543 | 0.856097 | 0.857726 | 0.861938 | 0.858753 | 0.860508 | 0.860508 | 0.854474 | 0.859154 | 0.844236 | 0.858225 | 0.830157 | 0.829180 | 0.846524 | 0.866113 | 0.877283 | 0.843585 | 0.876992 | 0.880929 | 0.857063 | 0.863899 | 0.844049 | 0.920369 | 0.840067 | 0.823971 | 0.862997 | 0.853788 | 0.838782 | 0.890442 | 0.877406 | 0.821615 | 0.809342 | 0.860173 | 0.809490 | 0.902532 | 0.657859 | 0.617508 | 0.595496 | 0.625993 | 0.646757 | 0.675864 | 0.667774 | 0.681382 | 0.665501 | 0.645853 | 0.639989 | 0.547133 | 0.965385 | 0.964405 | 0.971300 | 0.971439 | 0.972590 | 0.974296 | 0.974754 | 0.976053 | 0.972173 | 0.975214 | 0.971979 | 0.980773 | 0.964691 | 0.964238 | 0.972424 | 0.973288 | 0.973261 | 0.974926 | 0.975769 | 0.976906 | 0.971939 | 0.972111 | 0.971427 | 1.000000 | 0.964265 | 0.964163 | 0.973479 | 0.970136 | 0.973060 | 0.975599 | 0.975636 | 0.974635 | 0.970511 | 0.973178 | 0.969568 | 0.980249 | 0.789291 | 0.749243 | 0.744226 | 0.749153 | 0.759428 | 0.756500 | 0.753295 | 0.755440 | 0.750447 | 0.754041 | 0.746136 | 0.756286 | -0.365081 | 0.708231 |
| trans_count', Period2021-01_M | 0.462823 | 0.826641 | 0.821560 | 0.829909 | 0.836577 | 0.835303 | 0.833486 | 0.833541 | 0.834926 | 0.833616 | 0.833951 | 0.819112 | 0.830021 | 0.809665 | 0.808745 | 0.822310 | 0.850791 | 0.868204 | 0.833885 | 0.863269 | 0.870218 | 0.836720 | 0.847512 | 0.834519 | 0.879853 | 0.875907 | 0.810237 | 0.853340 | 0.841850 | 0.825547 | 0.879342 | 0.860408 | 0.812190 | 0.800130 | 0.851020 | 0.801373 | 0.889742 | 0.654801 | 0.606810 | 0.587042 | 0.617008 | 0.635991 | 0.666366 | 0.670563 | 0.673372 | 0.667436 | 0.639584 | 0.633427 | 0.540361 | 0.951658 | 0.949921 | 0.957540 | 0.956711 | 0.958093 | 0.957975 | 0.958479 | 0.960800 | 0.959070 | 0.956953 | 0.956454 | 0.965804 | 0.952411 | 0.951608 | 0.958580 | 0.956744 | 0.962352 | 0.963245 | 0.960589 | 0.966084 | 0.957493 | 0.957642 | 0.957782 | 0.964265 | 1.000000 | 0.953376 | 0.959565 | 0.957850 | 0.959772 | 0.966900 | 0.964613 | 0.962614 | 0.958987 | 0.961808 | 0.958380 | 0.970356 | 0.787025 | 0.746139 | 0.743819 | 0.747192 | 0.753897 | 0.753546 | 0.754962 | 0.750537 | 0.748493 | 0.752365 | 0.744249 | 0.753412 | -0.359618 | 0.701070 |
| trans_count', Period2021-02_M | 0.441932 | 0.825560 | 0.816118 | 0.829020 | 0.830156 | 0.836149 | 0.832801 | 0.834782 | 0.836035 | 0.824318 | 0.831182 | 0.818102 | 0.830879 | 0.803133 | 0.809038 | 0.818005 | 0.854158 | 0.858249 | 0.825870 | 0.860332 | 0.867196 | 0.830926 | 0.853544 | 0.824016 | 0.882983 | 0.834174 | 0.857156 | 0.850078 | 0.841145 | 0.829252 | 0.879574 | 0.857013 | 0.802824 | 0.798934 | 0.839201 | 0.800263 | 0.886847 | 0.633372 | 0.590488 | 0.568729 | 0.598934 | 0.626387 | 0.651256 | 0.649074 | 0.660545 | 0.643732 | 0.621754 | 0.618596 | 0.529033 | 0.952330 | 0.948162 | 0.956810 | 0.953235 | 0.957228 | 0.958327 | 0.959338 | 0.961154 | 0.953509 | 0.954982 | 0.956478 | 0.965623 | 0.949695 | 0.951270 | 0.956884 | 0.959650 | 0.958761 | 0.961454 | 0.960040 | 0.960967 | 0.957013 | 0.958227 | 0.956579 | 0.964163 | 0.953376 | 1.000000 | 0.959339 | 0.959525 | 0.958106 | 0.962248 | 0.960295 | 0.960365 | 0.957744 | 0.957385 | 0.956048 | 0.967637 | 0.769589 | 0.729400 | 0.722420 | 0.727557 | 0.736315 | 0.735138 | 0.733062 | 0.735412 | 0.730610 | 0.732912 | 0.724915 | 0.735258 | -0.348739 | 0.681696 |
| trans_count', Period2021-03_M | 0.469830 | 0.838643 | 0.829301 | 0.845552 | 0.841997 | 0.846114 | 0.846650 | 0.846012 | 0.847280 | 0.840854 | 0.846880 | 0.831175 | 0.843387 | 0.820457 | 0.820969 | 0.831303 | 0.861158 | 0.863834 | 0.836477 | 0.872952 | 0.877207 | 0.854235 | 0.856093 | 0.845602 | 0.895722 | 0.835654 | 0.826597 | 0.887800 | 0.844547 | 0.833328 | 0.887600 | 0.869511 | 0.817210 | 0.808183 | 0.854586 | 0.804833 | 0.895471 | 0.652921 | 0.606438 | 0.590446 | 0.620035 | 0.644692 | 0.679262 | 0.666764 | 0.676221 | 0.663677 | 0.640763 | 0.634508 | 0.546873 | 0.959168 | 0.953344 | 0.965124 | 0.961115 | 0.961153 | 0.966465 | 0.966234 | 0.967372 | 0.962640 | 0.965155 | 0.966038 | 0.971334 | 0.958699 | 0.962043 | 0.964395 | 0.965385 | 0.963938 | 0.968345 | 0.967478 | 0.968531 | 0.967219 | 0.964119 | 0.964056 | 0.973479 | 0.959565 | 0.959339 | 1.000000 | 0.963888 | 0.967176 | 0.970081 | 0.968838 | 0.968311 | 0.965067 | 0.967836 | 0.962760 | 0.974210 | 0.786148 | 0.746810 | 0.743238 | 0.747384 | 0.755562 | 0.755217 | 0.751008 | 0.750280 | 0.748087 | 0.750889 | 0.741452 | 0.752810 | -0.354280 | 0.704396 |
| trans_count', Period2021-04_M | 0.460774 | 0.828041 | 0.819529 | 0.834987 | 0.836232 | 0.837999 | 0.834347 | 0.837847 | 0.837990 | 0.833288 | 0.837649 | 0.824937 | 0.835539 | 0.811332 | 0.812729 | 0.822874 | 0.853805 | 0.870206 | 0.831869 | 0.866749 | 0.870659 | 0.844610 | 0.860606 | 0.836483 | 0.888803 | 0.838788 | 0.822809 | 0.854594 | 0.878431 | 0.825350 | 0.882491 | 0.867627 | 0.820722 | 0.800815 | 0.851739 | 0.803776 | 0.892859 | 0.643644 | 0.596235 | 0.573805 | 0.601691 | 0.628172 | 0.659500 | 0.654658 | 0.669028 | 0.653002 | 0.624286 | 0.630324 | 0.539859 | 0.956031 | 0.951654 | 0.963648 | 0.962026 | 0.962342 | 0.961785 | 0.964588 | 0.965271 | 0.963302 | 0.962992 | 0.963665 | 0.970523 | 0.955954 | 0.958085 | 0.964118 | 0.964611 | 0.967475 | 0.968492 | 0.964943 | 0.968172 | 0.966902 | 0.964959 | 0.962994 | 0.970136 | 0.957850 | 0.959525 | 0.963888 | 1.000000 | 0.964376 | 0.969328 | 0.969209 | 0.970394 | 0.966030 | 0.966264 | 0.963272 | 0.974138 | 0.779790 | 0.739221 | 0.734480 | 0.738094 | 0.747600 | 0.743250 | 0.741479 | 0.744841 | 0.739372 | 0.742116 | 0.736947 | 0.746653 | -0.350138 | 0.696229 |
| trans_count', Period2021-05_M | 0.456129 | 0.834356 | 0.829209 | 0.847176 | 0.846324 | 0.852313 | 0.845410 | 0.850273 | 0.849375 | 0.842423 | 0.845420 | 0.832278 | 0.845257 | 0.815117 | 0.821374 | 0.835737 | 0.858827 | 0.869265 | 0.835499 | 0.866498 | 0.871851 | 0.843989 | 0.855702 | 0.838458 | 0.890501 | 0.836818 | 0.816305 | 0.858874 | 0.846598 | 0.864922 | 0.884063 | 0.868917 | 0.815768 | 0.803747 | 0.853495 | 0.806309 | 0.893501 | 0.653296 | 0.614028 | 0.598556 | 0.623964 | 0.641113 | 0.675735 | 0.672850 | 0.685333 | 0.665010 | 0.646771 | 0.632335 | 0.532083 | 0.954068 | 0.953512 | 0.966080 | 0.962135 | 0.965299 | 0.965365 | 0.967582 | 0.968344 | 0.962944 | 0.962608 | 0.962288 | 0.973583 | 0.954790 | 0.961246 | 0.964176 | 0.967384 | 0.967042 | 0.966830 | 0.966433 | 0.966728 | 0.963072 | 0.962829 | 0.963997 | 0.973060 | 0.959772 | 0.958106 | 0.967176 | 0.964376 | 1.000000 | 0.969670 | 0.968215 | 0.971048 | 0.964092 | 0.966678 | 0.964735 | 0.974387 | 0.787577 | 0.751805 | 0.744920 | 0.750310 | 0.757040 | 0.758855 | 0.755219 | 0.755059 | 0.753025 | 0.756473 | 0.746699 | 0.758888 | -0.362948 | 0.708239 |
| trans_count', Period2021-06_M | 0.464774 | 0.837536 | 0.833878 | 0.846073 | 0.846305 | 0.847810 | 0.846228 | 0.848157 | 0.849608 | 0.842850 | 0.847209 | 0.833911 | 0.845331 | 0.819760 | 0.824557 | 0.834130 | 0.855394 | 0.869753 | 0.835679 | 0.871740 | 0.870259 | 0.849042 | 0.863612 | 0.847664 | 0.895285 | 0.843795 | 0.825359 | 0.862108 | 0.852480 | 0.835499 | 0.912472 | 0.872879 | 0.815288 | 0.811085 | 0.856025 | 0.800385 | 0.900457 | 0.659509 | 0.614823 | 0.593725 | 0.622822 | 0.643251 | 0.680391 | 0.673810 | 0.681682 | 0.666665 | 0.647751 | 0.635841 | 0.544141 | 0.958808 | 0.956888 | 0.967744 | 0.964745 | 0.966790 | 0.968006 | 0.968223 | 0.970177 | 0.965442 | 0.965642 | 0.965855 | 0.975204 | 0.963460 | 0.963273 | 0.968115 | 0.967328 | 0.972031 | 0.973206 | 0.971137 | 0.971284 | 0.969581 | 0.967311 | 0.967259 | 0.975599 | 0.966900 | 0.962248 | 0.970081 | 0.969328 | 0.969670 | 1.000000 | 0.971372 | 0.970706 | 0.970282 | 0.971336 | 0.968639 | 0.978535 | 0.790543 | 0.750392 | 0.746497 | 0.750463 | 0.758839 | 0.758136 | 0.756929 | 0.754762 | 0.749928 | 0.756802 | 0.746470 | 0.756788 | -0.366696 | 0.709025 |
| trans_count', Period2021-07_M | 0.463455 | 0.833660 | 0.829774 | 0.839410 | 0.841950 | 0.846822 | 0.842755 | 0.844969 | 0.845519 | 0.841096 | 0.843144 | 0.829447 | 0.840239 | 0.815666 | 0.816931 | 0.826848 | 0.858764 | 0.872193 | 0.829580 | 0.865096 | 0.876273 | 0.853595 | 0.859499 | 0.839343 | 0.891745 | 0.838308 | 0.817188 | 0.857279 | 0.844497 | 0.830185 | 0.884522 | 0.889547 | 0.814668 | 0.809659 | 0.853465 | 0.805457 | 0.893634 | 0.654579 | 0.611080 | 0.589418 | 0.614901 | 0.636049 | 0.673694 | 0.661167 | 0.678806 | 0.664568 | 0.634153 | 0.627578 | 0.543936 | 0.958951 | 0.957682 | 0.966647 | 0.963313 | 0.967326 | 0.968356 | 0.968941 | 0.971664 | 0.969243 | 0.966136 | 0.968151 | 0.976127 | 0.960988 | 0.961986 | 0.966779 | 0.966789 | 0.971233 | 0.970131 | 0.969552 | 0.971935 | 0.969082 | 0.969434 | 0.965533 | 0.975636 | 0.964613 | 0.960295 | 0.968838 | 0.969209 | 0.968215 | 0.971372 | 1.000000 | 0.970665 | 0.968743 | 0.969643 | 0.966476 | 0.976197 | 0.790116 | 0.749233 | 0.746122 | 0.747367 | 0.756898 | 0.756881 | 0.754701 | 0.753685 | 0.752369 | 0.752960 | 0.744341 | 0.755379 | -0.355006 | 0.704450 |
| trans_count', Period2021-08_M | 0.463294 | 0.835547 | 0.835839 | 0.845161 | 0.845408 | 0.849642 | 0.848757 | 0.848376 | 0.849981 | 0.844746 | 0.847269 | 0.833690 | 0.844223 | 0.818845 | 0.820168 | 0.833697 | 0.861111 | 0.881120 | 0.836382 | 0.874206 | 0.879403 | 0.853383 | 0.853258 | 0.842186 | 0.890474 | 0.838745 | 0.820443 | 0.857385 | 0.850550 | 0.841012 | 0.886666 | 0.867957 | 0.839336 | 0.813269 | 0.851245 | 0.808275 | 0.898360 | 0.651456 | 0.606073 | 0.588579 | 0.613598 | 0.640025 | 0.671988 | 0.659957 | 0.677184 | 0.661227 | 0.634086 | 0.630652 | 0.541859 | 0.960164 | 0.959193 | 0.967415 | 0.965198 | 0.967738 | 0.970575 | 0.969618 | 0.970743 | 0.967643 | 0.967799 | 0.966881 | 0.975600 | 0.962955 | 0.962092 | 0.970027 | 0.970366 | 0.970944 | 0.971464 | 0.971748 | 0.972227 | 0.968515 | 0.968051 | 0.967466 | 0.974635 | 0.962614 | 0.960365 | 0.968311 | 0.970394 | 0.971048 | 0.970706 | 0.970665 | 1.000000 | 0.969315 | 0.970057 | 0.966966 | 0.976751 | 0.782921 | 0.744314 | 0.739680 | 0.742245 | 0.752505 | 0.750216 | 0.747141 | 0.748458 | 0.745450 | 0.746564 | 0.741560 | 0.751305 | -0.355743 | 0.701626 |
| trans_count', Period2021-09_M | 0.458722 | 0.837525 | 0.831459 | 0.849971 | 0.845724 | 0.849384 | 0.846544 | 0.849497 | 0.851840 | 0.842654 | 0.845240 | 0.833769 | 0.845563 | 0.821093 | 0.821949 | 0.831932 | 0.859388 | 0.875106 | 0.834678 | 0.870715 | 0.870947 | 0.848028 | 0.859555 | 0.836770 | 0.891990 | 0.842528 | 0.825624 | 0.855611 | 0.849588 | 0.829128 | 0.887948 | 0.871057 | 0.815995 | 0.835917 | 0.848588 | 0.809525 | 0.895177 | 0.658455 | 0.617068 | 0.599657 | 0.625401 | 0.644344 | 0.675514 | 0.669148 | 0.683037 | 0.670442 | 0.642370 | 0.630006 | 0.533599 | 0.954914 | 0.953113 | 0.963395 | 0.960395 | 0.963388 | 0.964871 | 0.964286 | 0.967188 | 0.961759 | 0.961547 | 0.962357 | 0.971847 | 0.956199 | 0.956747 | 0.961727 | 0.962557 | 0.964507 | 0.966081 | 0.964427 | 0.968995 | 0.964626 | 0.964721 | 0.962411 | 0.970511 | 0.958987 | 0.957744 | 0.965067 | 0.966030 | 0.964092 | 0.970282 | 0.968743 | 0.969315 | 1.000000 | 0.965769 | 0.963908 | 0.971302 | 0.786767 | 0.750939 | 0.745028 | 0.748196 | 0.757107 | 0.754044 | 0.753202 | 0.753857 | 0.749827 | 0.752034 | 0.743931 | 0.754921 | -0.362175 | 0.708332 |
| trans_count', Period2021-10_M | 0.466350 | 0.839388 | 0.833806 | 0.845954 | 0.847967 | 0.850287 | 0.850755 | 0.852688 | 0.851657 | 0.846919 | 0.851644 | 0.832424 | 0.844685 | 0.821475 | 0.823526 | 0.831107 | 0.861739 | 0.872996 | 0.845524 | 0.874834 | 0.878606 | 0.845503 | 0.859741 | 0.846757 | 0.893257 | 0.835343 | 0.822236 | 0.861903 | 0.852824 | 0.835904 | 0.887688 | 0.873343 | 0.814881 | 0.809729 | 0.880462 | 0.803156 | 0.896317 | 0.649211 | 0.610966 | 0.591634 | 0.621812 | 0.641474 | 0.672309 | 0.663814 | 0.678189 | 0.664846 | 0.642102 | 0.633811 | 0.541509 | 0.956624 | 0.954428 | 0.965989 | 0.962532 | 0.965503 | 0.967798 | 0.968400 | 0.967040 | 0.965315 | 0.966302 | 0.964773 | 0.972297 | 0.962035 | 0.958393 | 0.962930 | 0.967412 | 0.968716 | 0.970614 | 0.968909 | 0.968149 | 0.965520 | 0.965267 | 0.966568 | 0.973178 | 0.961808 | 0.957385 | 0.967836 | 0.966264 | 0.966678 | 0.971336 | 0.969643 | 0.970057 | 0.965769 | 1.000000 | 0.966450 | 0.975317 | 0.783248 | 0.745005 | 0.739404 | 0.744373 | 0.753530 | 0.752964 | 0.749299 | 0.748529 | 0.746224 | 0.747870 | 0.740236 | 0.750631 | -0.361170 | 0.706262 |
| trans_count', Period2021-11_M | 0.446187 | 0.830636 | 0.827235 | 0.834370 | 0.835301 | 0.842099 | 0.837605 | 0.838165 | 0.838768 | 0.833687 | 0.835088 | 0.823108 | 0.835941 | 0.808098 | 0.812496 | 0.824353 | 0.853786 | 0.864882 | 0.832502 | 0.866862 | 0.868488 | 0.835618 | 0.856345 | 0.832861 | 0.885006 | 0.833453 | 0.814392 | 0.854579 | 0.843407 | 0.827985 | 0.879331 | 0.858957 | 0.809838 | 0.803070 | 0.849140 | 0.830238 | 0.889986 | 0.644061 | 0.611118 | 0.586675 | 0.614304 | 0.635523 | 0.670243 | 0.657405 | 0.678653 | 0.658287 | 0.637462 | 0.625715 | 0.533113 | 0.957738 | 0.955970 | 0.963370 | 0.959376 | 0.964357 | 0.964403 | 0.963448 | 0.964941 | 0.963436 | 0.963415 | 0.962756 | 0.971675 | 0.954551 | 0.959116 | 0.964579 | 0.964423 | 0.965942 | 0.966338 | 0.966044 | 0.965690 | 0.961819 | 0.964728 | 0.962174 | 0.969568 | 0.958380 | 0.956048 | 0.962760 | 0.963272 | 0.964735 | 0.968639 | 0.966476 | 0.966966 | 0.963908 | 0.966450 | 1.000000 | 0.971746 | 0.786011 | 0.748204 | 0.742017 | 0.745742 | 0.756162 | 0.755158 | 0.750955 | 0.753536 | 0.749633 | 0.751347 | 0.742418 | 0.755374 | -0.351511 | 0.700989 |
| trans_count', Period2021-12_M | 0.463210 | 0.838665 | 0.834301 | 0.849719 | 0.846067 | 0.852541 | 0.848754 | 0.852593 | 0.852516 | 0.846134 | 0.848271 | 0.835341 | 0.847102 | 0.823219 | 0.823028 | 0.836989 | 0.862990 | 0.879445 | 0.841700 | 0.877744 | 0.881406 | 0.856426 | 0.863875 | 0.848910 | 0.896249 | 0.838157 | 0.826159 | 0.863449 | 0.852731 | 0.840924 | 0.891030 | 0.876728 | 0.822088 | 0.812219 | 0.858108 | 0.812390 | 0.915550 | 0.650143 | 0.606469 | 0.585431 | 0.612539 | 0.634962 | 0.668539 | 0.667213 | 0.676513 | 0.658597 | 0.635385 | 0.629749 | 0.547174 | 0.964976 | 0.962359 | 0.973897 | 0.968034 | 0.971579 | 0.973892 | 0.974856 | 0.975982 | 0.970957 | 0.971664 | 0.972521 | 0.979625 | 0.967092 | 0.965633 | 0.972924 | 0.972256 | 0.976048 | 0.976460 | 0.976628 | 0.976594 | 0.973261 | 0.972266 | 0.973341 | 0.980249 | 0.970356 | 0.967637 | 0.974210 | 0.974138 | 0.974387 | 0.978535 | 0.976197 | 0.976751 | 0.971302 | 0.975317 | 0.971746 | 1.000000 | 0.785870 | 0.744862 | 0.740697 | 0.744175 | 0.753113 | 0.750121 | 0.749450 | 0.749103 | 0.747317 | 0.749160 | 0.741414 | 0.751404 | -0.358288 | 0.703522 |
| trans_count', Period2022-01_M | 0.391451 | 0.711435 | 0.692535 | 0.712484 | 0.720980 | 0.720615 | 0.714582 | 0.718733 | 0.715572 | 0.717429 | 0.715502 | 0.709497 | 0.723941 | 0.700432 | 0.699454 | 0.709346 | 0.697603 | 0.699806 | 0.679121 | 0.703439 | 0.710189 | 0.704659 | 0.685613 | 0.693071 | 0.712751 | 0.698680 | 0.657298 | 0.706977 | 0.675977 | 0.682073 | 0.727882 | 0.715777 | 0.673349 | 0.671201 | 0.703209 | 0.670213 | 0.734516 | 0.870999 | 0.825256 | 0.813924 | 0.836083 | 0.858392 | 0.864814 | 0.859580 | 0.873827 | 0.853545 | 0.837831 | 0.812652 | 0.431212 | 0.775488 | 0.769627 | 0.780958 | 0.780228 | 0.783582 | 0.779776 | 0.787370 | 0.782188 | 0.781901 | 0.780135 | 0.784834 | 0.791597 | 0.780749 | 0.778878 | 0.778698 | 0.776766 | 0.784279 | 0.785753 | 0.782775 | 0.790424 | 0.787757 | 0.771896 | 0.786643 | 0.789291 | 0.787025 | 0.769589 | 0.786148 | 0.779790 | 0.787577 | 0.790543 | 0.790116 | 0.782921 | 0.786767 | 0.783248 | 0.786011 | 0.785870 | 1.000000 | 0.957469 | 0.963603 | 0.965188 | 0.967788 | 0.967119 | 0.968045 | 0.966760 | 0.964363 | 0.963514 | 0.963986 | 0.970047 | -0.489930 | 0.887249 |
| trans_count', Period2022-02_M | 0.396722 | 0.688998 | 0.673323 | 0.689472 | 0.698612 | 0.699520 | 0.693278 | 0.700187 | 0.692499 | 0.694365 | 0.686305 | 0.686723 | 0.703548 | 0.675948 | 0.680356 | 0.685964 | 0.669698 | 0.673265 | 0.654875 | 0.671278 | 0.680931 | 0.675514 | 0.663714 | 0.664956 | 0.681482 | 0.675285 | 0.624002 | 0.677974 | 0.637803 | 0.651841 | 0.692078 | 0.687885 | 0.648396 | 0.632295 | 0.674024 | 0.642693 | 0.700906 | 0.841274 | 0.871897 | 0.831407 | 0.841552 | 0.865691 | 0.870410 | 0.869389 | 0.881946 | 0.863684 | 0.840799 | 0.816538 | 0.423790 | 0.735984 | 0.732719 | 0.741742 | 0.745064 | 0.746225 | 0.742189 | 0.752561 | 0.741296 | 0.742125 | 0.741882 | 0.745240 | 0.752245 | 0.743570 | 0.743161 | 0.739112 | 0.736384 | 0.745808 | 0.748882 | 0.742504 | 0.749905 | 0.749815 | 0.732712 | 0.745893 | 0.749243 | 0.746139 | 0.729400 | 0.746810 | 0.739221 | 0.751805 | 0.750392 | 0.749233 | 0.744314 | 0.750939 | 0.745005 | 0.748204 | 0.744862 | 0.957469 | 1.000000 | 0.967119 | 0.964290 | 0.966479 | 0.968972 | 0.970496 | 0.970830 | 0.968693 | 0.966457 | 0.968443 | 0.973884 | -0.479164 | 0.890640 |
| trans_count', Period2022-03_M | 0.382438 | 0.678354 | 0.661588 | 0.680758 | 0.691083 | 0.687619 | 0.683926 | 0.690842 | 0.680239 | 0.682877 | 0.681231 | 0.678714 | 0.692590 | 0.664802 | 0.667740 | 0.677049 | 0.657860 | 0.662994 | 0.643607 | 0.666287 | 0.677098 | 0.669384 | 0.650814 | 0.659089 | 0.673947 | 0.666183 | 0.622298 | 0.674358 | 0.636735 | 0.645395 | 0.691239 | 0.682778 | 0.641121 | 0.632333 | 0.660750 | 0.634674 | 0.695031 | 0.847472 | 0.842511 | 0.850891 | 0.846604 | 0.869395 | 0.873804 | 0.867990 | 0.883200 | 0.859313 | 0.843504 | 0.829292 | 0.410006 | 0.731423 | 0.725674 | 0.736975 | 0.738785 | 0.740327 | 0.737970 | 0.745791 | 0.734382 | 0.735249 | 0.739841 | 0.740896 | 0.747135 | 0.736706 | 0.737095 | 0.733378 | 0.730123 | 0.740189 | 0.743980 | 0.737320 | 0.746597 | 0.745030 | 0.728545 | 0.739723 | 0.744226 | 0.743819 | 0.722420 | 0.743238 | 0.734480 | 0.744920 | 0.746497 | 0.746122 | 0.739680 | 0.745028 | 0.739404 | 0.742017 | 0.740697 | 0.963603 | 0.967119 | 1.000000 | 0.971317 | 0.973263 | 0.973231 | 0.974385 | 0.974895 | 0.970180 | 0.973238 | 0.973015 | 0.979845 | -0.485866 | 0.893646 |
| trans_count', Period2022-04_M | 0.384457 | 0.681420 | 0.663667 | 0.683252 | 0.691615 | 0.693059 | 0.686240 | 0.692114 | 0.683681 | 0.688189 | 0.682674 | 0.686633 | 0.696505 | 0.666009 | 0.674279 | 0.678187 | 0.666771 | 0.664106 | 0.647125 | 0.663314 | 0.674186 | 0.675292 | 0.653688 | 0.657854 | 0.679059 | 0.662207 | 0.620498 | 0.674833 | 0.632547 | 0.651801 | 0.695940 | 0.684271 | 0.638037 | 0.631868 | 0.668346 | 0.636572 | 0.697487 | 0.851319 | 0.840038 | 0.829609 | 0.869563 | 0.867248 | 0.871857 | 0.867408 | 0.885347 | 0.860971 | 0.844079 | 0.813356 | 0.408992 | 0.733094 | 0.729947 | 0.741081 | 0.740690 | 0.743999 | 0.740574 | 0.749167 | 0.740281 | 0.739738 | 0.740984 | 0.746418 | 0.752619 | 0.740364 | 0.741507 | 0.736065 | 0.737487 | 0.744327 | 0.747076 | 0.742025 | 0.748057 | 0.748167 | 0.734611 | 0.745997 | 0.749153 | 0.747192 | 0.727557 | 0.747384 | 0.738094 | 0.750310 | 0.750463 | 0.747367 | 0.742245 | 0.748196 | 0.744373 | 0.745742 | 0.744175 | 0.965188 | 0.964290 | 0.971317 | 1.000000 | 0.972530 | 0.974927 | 0.975565 | 0.974188 | 0.972178 | 0.971752 | 0.971840 | 0.978184 | -0.485535 | 0.888429 |
| trans_count', Period2022-05_M | 0.380524 | 0.693669 | 0.677460 | 0.694329 | 0.703747 | 0.703482 | 0.697467 | 0.703907 | 0.696728 | 0.696022 | 0.695304 | 0.692764 | 0.708274 | 0.681234 | 0.681749 | 0.691315 | 0.675192 | 0.678629 | 0.660406 | 0.678809 | 0.686733 | 0.682876 | 0.657781 | 0.674329 | 0.688090 | 0.673722 | 0.631581 | 0.686381 | 0.649073 | 0.658279 | 0.699393 | 0.695373 | 0.646214 | 0.635526 | 0.677952 | 0.648841 | 0.706582 | 0.851104 | 0.839816 | 0.833267 | 0.850124 | 0.887618 | 0.871964 | 0.870923 | 0.881052 | 0.859493 | 0.841560 | 0.822461 | 0.412935 | 0.746574 | 0.739931 | 0.749054 | 0.753052 | 0.754230 | 0.751734 | 0.759328 | 0.749740 | 0.747635 | 0.752118 | 0.753251 | 0.761707 | 0.749660 | 0.749530 | 0.747440 | 0.748209 | 0.754730 | 0.759274 | 0.751692 | 0.760513 | 0.759267 | 0.741723 | 0.756297 | 0.759428 | 0.753897 | 0.736315 | 0.755562 | 0.747600 | 0.757040 | 0.758839 | 0.756898 | 0.752505 | 0.757107 | 0.753530 | 0.756162 | 0.753113 | 0.967788 | 0.966479 | 0.973263 | 0.972530 | 1.000000 | 0.974238 | 0.976367 | 0.974511 | 0.972672 | 0.973601 | 0.972961 | 0.981247 | -0.482944 | 0.892923 |
| trans_count', Period2022-06_M | 0.390909 | 0.698531 | 0.675784 | 0.697214 | 0.701992 | 0.705345 | 0.698274 | 0.705526 | 0.694803 | 0.697622 | 0.697443 | 0.694308 | 0.710196 | 0.677933 | 0.685305 | 0.688463 | 0.673857 | 0.679130 | 0.663682 | 0.675417 | 0.684657 | 0.683377 | 0.662728 | 0.669810 | 0.688169 | 0.676060 | 0.627268 | 0.686826 | 0.643742 | 0.660551 | 0.699280 | 0.691893 | 0.649635 | 0.638216 | 0.677476 | 0.648653 | 0.707091 | 0.851914 | 0.846499 | 0.835346 | 0.852970 | 0.871093 | 0.892583 | 0.869245 | 0.887219 | 0.868626 | 0.846076 | 0.824874 | 0.416941 | 0.746206 | 0.737556 | 0.750041 | 0.750198 | 0.752125 | 0.750194 | 0.758590 | 0.747682 | 0.747517 | 0.751737 | 0.753602 | 0.761485 | 0.747283 | 0.751023 | 0.744063 | 0.745056 | 0.753696 | 0.758108 | 0.750248 | 0.756870 | 0.756972 | 0.741505 | 0.752521 | 0.756500 | 0.753546 | 0.735138 | 0.755217 | 0.743250 | 0.758855 | 0.758136 | 0.756881 | 0.750216 | 0.754044 | 0.752964 | 0.755158 | 0.750121 | 0.967119 | 0.968972 | 0.973231 | 0.974927 | 0.974238 | 1.000000 | 0.976317 | 0.977212 | 0.974427 | 0.975974 | 0.975510 | 0.981595 | -0.482791 | 0.892607 |
| trans_count', Period2022-07_M | 0.401394 | 0.689756 | 0.674317 | 0.688379 | 0.698905 | 0.700335 | 0.694527 | 0.702023 | 0.692115 | 0.696155 | 0.692981 | 0.691582 | 0.703130 | 0.677273 | 0.678660 | 0.686814 | 0.670431 | 0.669480 | 0.656154 | 0.671938 | 0.680064 | 0.678496 | 0.657600 | 0.666289 | 0.681234 | 0.678518 | 0.626695 | 0.682636 | 0.639502 | 0.656361 | 0.693067 | 0.689911 | 0.644750 | 0.637049 | 0.672796 | 0.641581 | 0.705079 | 0.850658 | 0.843304 | 0.833152 | 0.851385 | 0.871987 | 0.871604 | 0.890620 | 0.885121 | 0.863077 | 0.845524 | 0.819645 | 0.421444 | 0.741239 | 0.736602 | 0.745717 | 0.747987 | 0.750236 | 0.745826 | 0.754874 | 0.746185 | 0.745144 | 0.747354 | 0.749813 | 0.757786 | 0.746232 | 0.744692 | 0.741180 | 0.740429 | 0.749451 | 0.753794 | 0.744893 | 0.754907 | 0.755099 | 0.737114 | 0.751304 | 0.753295 | 0.754962 | 0.733062 | 0.751008 | 0.741479 | 0.755219 | 0.756929 | 0.754701 | 0.747141 | 0.753202 | 0.749299 | 0.750955 | 0.749450 | 0.968045 | 0.970496 | 0.974385 | 0.975565 | 0.976367 | 0.976317 | 1.000000 | 0.976161 | 0.973020 | 0.974739 | 0.971947 | 0.982790 | -0.497159 | 0.896988 |
| trans_count', Period2022-08_M | 0.383796 | 0.689761 | 0.670360 | 0.690341 | 0.698208 | 0.699899 | 0.693276 | 0.700026 | 0.691264 | 0.692092 | 0.692508 | 0.689957 | 0.704423 | 0.675784 | 0.676305 | 0.686971 | 0.666265 | 0.670035 | 0.650962 | 0.668603 | 0.678642 | 0.678743 | 0.659911 | 0.663248 | 0.680155 | 0.672926 | 0.625790 | 0.677295 | 0.646275 | 0.653789 | 0.699583 | 0.690176 | 0.644037 | 0.637886 | 0.670578 | 0.645226 | 0.701228 | 0.849944 | 0.843900 | 0.833228 | 0.849214 | 0.869243 | 0.869112 | 0.869859 | 0.906348 | 0.865410 | 0.844315 | 0.825852 | 0.417321 | 0.742995 | 0.735203 | 0.747340 | 0.747760 | 0.750630 | 0.747912 | 0.755431 | 0.745657 | 0.745429 | 0.749050 | 0.750523 | 0.759768 | 0.747268 | 0.745271 | 0.744439 | 0.741835 | 0.747893 | 0.754195 | 0.746855 | 0.755740 | 0.753857 | 0.739946 | 0.749183 | 0.755440 | 0.750537 | 0.735412 | 0.750280 | 0.744841 | 0.755059 | 0.754762 | 0.753685 | 0.748458 | 0.753857 | 0.748529 | 0.753536 | 0.749103 | 0.966760 | 0.970830 | 0.974895 | 0.974188 | 0.974511 | 0.977212 | 0.976161 | 1.000000 | 0.975816 | 0.974499 | 0.976368 | 0.982610 | -0.494344 | 0.895401 |
| trans_count', Period2022-09_M | 0.388190 | 0.684164 | 0.668236 | 0.682826 | 0.690579 | 0.690638 | 0.684839 | 0.693268 | 0.685406 | 0.685192 | 0.681990 | 0.679749 | 0.695758 | 0.668616 | 0.671894 | 0.679282 | 0.660779 | 0.673062 | 0.647449 | 0.667312 | 0.679077 | 0.674979 | 0.654772 | 0.656360 | 0.674944 | 0.670954 | 0.625327 | 0.675826 | 0.634655 | 0.646701 | 0.687320 | 0.688407 | 0.641402 | 0.628512 | 0.672335 | 0.646185 | 0.699030 | 0.845167 | 0.843595 | 0.828886 | 0.845094 | 0.865247 | 0.868073 | 0.864652 | 0.879271 | 0.881175 | 0.841544 | 0.818578 | 0.416938 | 0.740085 | 0.734834 | 0.743815 | 0.746225 | 0.747060 | 0.744242 | 0.752706 | 0.743632 | 0.743720 | 0.743306 | 0.744999 | 0.756140 | 0.744572 | 0.745141 | 0.740225 | 0.738688 | 0.747222 | 0.750486 | 0.744123 | 0.751406 | 0.749845 | 0.736533 | 0.745365 | 0.750447 | 0.748493 | 0.730610 | 0.748087 | 0.739372 | 0.753025 | 0.749928 | 0.752369 | 0.745450 | 0.749827 | 0.746224 | 0.749633 | 0.747317 | 0.964363 | 0.968693 | 0.970180 | 0.972178 | 0.972672 | 0.974427 | 0.973020 | 0.975816 | 1.000000 | 0.970902 | 0.971544 | 0.979117 | -0.485123 | 0.890208 |
| trans_count', Period2022-10_M | 0.382005 | 0.692214 | 0.671368 | 0.691742 | 0.701867 | 0.701191 | 0.695884 | 0.703741 | 0.693420 | 0.695329 | 0.692692 | 0.695174 | 0.706967 | 0.675463 | 0.683872 | 0.691428 | 0.674390 | 0.675682 | 0.656761 | 0.674469 | 0.684896 | 0.683370 | 0.662381 | 0.667404 | 0.684281 | 0.678259 | 0.630377 | 0.680370 | 0.644966 | 0.662196 | 0.705856 | 0.688227 | 0.648356 | 0.639768 | 0.675826 | 0.645588 | 0.706657 | 0.854856 | 0.844115 | 0.832791 | 0.852597 | 0.867683 | 0.879390 | 0.875819 | 0.885680 | 0.859560 | 0.870270 | 0.821202 | 0.409012 | 0.741331 | 0.733239 | 0.744029 | 0.747917 | 0.747656 | 0.744891 | 0.754414 | 0.744906 | 0.742906 | 0.746829 | 0.752355 | 0.757085 | 0.744963 | 0.748538 | 0.741849 | 0.741348 | 0.749482 | 0.755922 | 0.749437 | 0.753604 | 0.753261 | 0.736975 | 0.751414 | 0.754041 | 0.752365 | 0.732912 | 0.750889 | 0.742116 | 0.756473 | 0.756802 | 0.752960 | 0.746564 | 0.752034 | 0.747870 | 0.751347 | 0.749160 | 0.963514 | 0.966457 | 0.973238 | 0.971752 | 0.973601 | 0.975974 | 0.974739 | 0.974499 | 0.970902 | 1.000000 | 0.971222 | 0.978947 | -0.474939 | 0.894353 |
| trans_count', Period2022-11_M | 0.384506 | 0.681917 | 0.666378 | 0.687030 | 0.694853 | 0.692597 | 0.688481 | 0.692095 | 0.685243 | 0.687185 | 0.686582 | 0.684785 | 0.699163 | 0.667332 | 0.670156 | 0.682229 | 0.661446 | 0.666054 | 0.647089 | 0.667621 | 0.673862 | 0.672602 | 0.653144 | 0.652119 | 0.676305 | 0.668173 | 0.617407 | 0.673343 | 0.633539 | 0.651326 | 0.692138 | 0.680297 | 0.633065 | 0.632765 | 0.664605 | 0.640701 | 0.695593 | 0.849208 | 0.845864 | 0.833495 | 0.847493 | 0.868579 | 0.874152 | 0.863439 | 0.884814 | 0.863772 | 0.842455 | 0.847240 | 0.412859 | 0.732299 | 0.727435 | 0.739627 | 0.741938 | 0.740346 | 0.739755 | 0.744880 | 0.737392 | 0.736241 | 0.741341 | 0.741925 | 0.749638 | 0.736640 | 0.738885 | 0.735997 | 0.735272 | 0.740257 | 0.745867 | 0.739283 | 0.747308 | 0.746290 | 0.730454 | 0.740298 | 0.746136 | 0.744249 | 0.724915 | 0.741452 | 0.736947 | 0.746699 | 0.746470 | 0.744341 | 0.741560 | 0.743931 | 0.740236 | 0.742418 | 0.741414 | 0.963986 | 0.968443 | 0.973015 | 0.971840 | 0.972961 | 0.975510 | 0.971947 | 0.976368 | 0.971544 | 0.971222 | 1.000000 | 0.978277 | -0.488068 | 0.892340 |
| trans_count', Period2022-12_M | 0.386503 | 0.691093 | 0.672320 | 0.691127 | 0.699185 | 0.699547 | 0.693433 | 0.700103 | 0.691406 | 0.694720 | 0.691326 | 0.689249 | 0.703618 | 0.675469 | 0.675368 | 0.687648 | 0.671346 | 0.674330 | 0.655558 | 0.673420 | 0.681997 | 0.677314 | 0.658319 | 0.668906 | 0.682646 | 0.673478 | 0.621871 | 0.682048 | 0.643505 | 0.658233 | 0.699529 | 0.691764 | 0.640839 | 0.635479 | 0.675287 | 0.644104 | 0.701500 | 0.850431 | 0.845610 | 0.836847 | 0.848946 | 0.873182 | 0.873739 | 0.873687 | 0.888905 | 0.866008 | 0.840952 | 0.822226 | 0.414490 | 0.744807 | 0.737034 | 0.749861 | 0.750305 | 0.752428 | 0.748385 | 0.757298 | 0.746790 | 0.747488 | 0.749673 | 0.752143 | 0.760988 | 0.747145 | 0.748008 | 0.745461 | 0.743318 | 0.753715 | 0.757086 | 0.748059 | 0.756881 | 0.755251 | 0.741615 | 0.752085 | 0.756286 | 0.753412 | 0.735258 | 0.752810 | 0.746653 | 0.758888 | 0.756788 | 0.755379 | 0.751305 | 0.754921 | 0.750631 | 0.755374 | 0.751404 | 0.970047 | 0.973884 | 0.979845 | 0.978184 | 0.981247 | 0.981595 | 0.982790 | 0.982610 | 0.979117 | 0.978947 | 0.978277 | 1.000000 | -0.496004 | 0.906172 |
| age | -0.183845 | -0.314821 | -0.308327 | -0.317154 | -0.318468 | -0.320345 | -0.327645 | -0.330162 | -0.318723 | -0.324954 | -0.317660 | -0.312195 | -0.313784 | -0.305790 | -0.291454 | -0.316831 | -0.248190 | -0.280079 | -0.253134 | -0.250594 | -0.262369 | -0.263945 | -0.236607 | -0.257556 | -0.266159 | -0.287751 | -0.256310 | -0.264663 | -0.265520 | -0.281386 | -0.316324 | -0.302711 | -0.287198 | -0.275483 | -0.292653 | -0.277398 | -0.318287 | -0.414632 | -0.410968 | -0.401775 | -0.413467 | -0.410973 | -0.394793 | -0.395597 | -0.404256 | -0.402856 | -0.386815 | -0.398418 | -0.188891 | -0.347326 | -0.343110 | -0.357593 | -0.347712 | -0.351065 | -0.354487 | -0.352189 | -0.349196 | -0.351742 | -0.346931 | -0.347030 | -0.365127 | -0.346319 | -0.339621 | -0.358257 | -0.338082 | -0.365077 | -0.348231 | -0.343487 | -0.367185 | -0.357035 | -0.348091 | -0.350098 | -0.365081 | -0.359618 | -0.348739 | -0.354280 | -0.350138 | -0.362948 | -0.366696 | -0.355006 | -0.355743 | -0.362175 | -0.361170 | -0.351511 | -0.358288 | -0.489930 | -0.479164 | -0.485866 | -0.485535 | -0.482944 | -0.482791 | -0.497159 | -0.494344 | -0.485123 | -0.474939 | -0.488068 | -0.496004 | 1.000000 | -0.414863 |
| Target | 0.434274 | 0.756122 | 0.736216 | 0.754889 | 0.760828 | 0.762687 | 0.758815 | 0.767414 | 0.757570 | 0.761886 | 0.754560 | 0.754430 | 0.771768 | 0.752379 | 0.748464 | 0.762595 | 0.697025 | 0.709183 | 0.672751 | 0.701580 | 0.694828 | 0.693890 | 0.674775 | 0.689751 | 0.706695 | 0.688967 | 0.638166 | 0.711260 | 0.666261 | 0.678258 | 0.709477 | 0.722674 | 0.651812 | 0.662267 | 0.709599 | 0.666978 | 0.731005 | 0.881193 | 0.882408 | 0.875140 | 0.884264 | 0.895476 | 0.873737 | 0.877524 | 0.887157 | 0.878777 | 0.855533 | 0.831043 | 0.399576 | 0.692425 | 0.687360 | 0.696378 | 0.698029 | 0.699534 | 0.697402 | 0.706792 | 0.694985 | 0.699467 | 0.698112 | 0.703649 | 0.703665 | 0.703251 | 0.696683 | 0.698150 | 0.695663 | 0.704129 | 0.710393 | 0.703322 | 0.703759 | 0.707639 | 0.688113 | 0.704729 | 0.708231 | 0.701070 | 0.681696 | 0.704396 | 0.696229 | 0.708239 | 0.709025 | 0.704450 | 0.701626 | 0.708332 | 0.706262 | 0.700989 | 0.703522 | 0.887249 | 0.890640 | 0.893646 | 0.888429 | 0.892923 | 0.892607 | 0.896988 | 0.895401 | 0.890208 | 0.894353 | 0.892340 | 0.906172 | -0.414863 | 1.000000 |
from sklearn.model_selection import train_test_split
data_df, test_df = train_test_split(trans_df, test_size=0.25, random_state=42)
train_df, val_df = train_test_split(data_df, test_size=0.25, random_state=42)
from sklearn.preprocessing import OneHotEncoder
train_gender = train_df['gender'].values.reshape(-1, 1)
val_gender = val_df['gender'].values.reshape(-1, 1)
test_gender = test_df['gender'].values.reshape(-1, 1)
encoder = OneHotEncoder(sparse=False, handle_unknown='ignore')
encoder.fit(train_gender)
/opt/homebrew/lib/python3.10/site-packages/sklearn/preprocessing/_encoders.py:868: FutureWarning: `sparse` was renamed to `sparse_output` in version 1.2 and will be removed in 1.4. `sparse_output` is ignored unless you leave `sparse` to its default value. warnings.warn(
OneHotEncoder(handle_unknown='ignore', sparse=False, sparse_output=False)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
OneHotEncoder(handle_unknown='ignore', sparse=False, sparse_output=False)
train_gender_encoded = encoder.transform(train_gender)
val_gender_encoded = encoder.transform(val_gender)
test_gender_encoded = encoder.transform(test_gender)
train_encoded = pd.DataFrame(train_gender_encoded, columns=encoder.get_feature_names_out(['gender']), index=train_df.index)
val_encoded = pd.DataFrame(val_gender_encoded, columns=encoder.get_feature_names_out(['gender']), index=val_df.index)
test_encoded = pd.DataFrame(test_gender_encoded, columns=encoder.get_feature_names_out(['gender']), index=test_df.index)
train_df = pd.concat([train_df.drop('gender', axis=1), train_encoded], axis=1)
val_df = pd.concat([val_df.drop('gender', axis=1), val_encoded], axis=1)
test_df = pd.concat([test_df.drop('gender', axis=1), test_encoded], axis=1)
train_df.head()
| acct_num | total_amt', Period2018-12_M | total_amt', Period2019-01_M | total_amt', Period2019-02_M | total_amt', Period2019-03_M | total_amt', Period2019-04_M | total_amt', Period2019-05_M | total_amt', Period2019-06_M | total_amt', Period2019-07_M | total_amt', Period2019-08_M | total_amt', Period2019-09_M | total_amt', Period2019-10_M | total_amt', Period2019-11_M | total_amt', Period2019-12_M | total_amt', Period2020-01_M | total_amt', Period2020-02_M | total_amt', Period2020-03_M | total_amt', Period2020-04_M | total_amt', Period2020-05_M | total_amt', Period2020-06_M | total_amt', Period2020-07_M | total_amt', Period2020-08_M | total_amt', Period2020-09_M | total_amt', Period2020-10_M | total_amt', Period2020-11_M | total_amt', Period2020-12_M | total_amt', Period2021-01_M | total_amt', Period2021-02_M | total_amt', Period2021-03_M | total_amt', Period2021-04_M | total_amt', Period2021-05_M | total_amt', Period2021-06_M | total_amt', Period2021-07_M | total_amt', Period2021-08_M | total_amt', Period2021-09_M | total_amt', Period2021-10_M | total_amt', Period2021-11_M | total_amt', Period2021-12_M | total_amt', Period2022-01_M | total_amt', Period2022-02_M | total_amt', Period2022-03_M | total_amt', Period2022-04_M | total_amt', Period2022-05_M | total_amt', Period2022-06_M | total_amt', Period2022-07_M | total_amt', Period2022-08_M | total_amt', Period2022-09_M | total_amt', Period2022-10_M | total_amt', Period2022-11_M | trans_count', Period2018-12_M | trans_count', Period2019-01_M | trans_count', Period2019-02_M | trans_count', Period2019-03_M | trans_count', Period2019-04_M | trans_count', Period2019-05_M | trans_count', Period2019-06_M | trans_count', Period2019-07_M | trans_count', Period2019-08_M | trans_count', Period2019-09_M | trans_count', Period2019-10_M | trans_count', Period2019-11_M | trans_count', Period2019-12_M | trans_count', Period2020-01_M | trans_count', Period2020-02_M | trans_count', Period2020-03_M | trans_count', Period2020-04_M | trans_count', Period2020-05_M | trans_count', Period2020-06_M | trans_count', Period2020-07_M | trans_count', Period2020-08_M | trans_count', Period2020-09_M | trans_count', Period2020-10_M | trans_count', Period2020-11_M | trans_count', Period2020-12_M | trans_count', Period2021-01_M | trans_count', Period2021-02_M | trans_count', Period2021-03_M | trans_count', Period2021-04_M | trans_count', Period2021-05_M | trans_count', Period2021-06_M | trans_count', Period2021-07_M | trans_count', Period2021-08_M | trans_count', Period2021-09_M | trans_count', Period2021-10_M | trans_count', Period2021-11_M | trans_count', Period2021-12_M | trans_count', Period2022-01_M | trans_count', Period2022-02_M | trans_count', Period2022-03_M | trans_count', Period2022-04_M | trans_count', Period2022-05_M | trans_count', Period2022-06_M | trans_count', Period2022-07_M | trans_count', Period2022-08_M | trans_count', Period2022-09_M | trans_count', Period2022-10_M | trans_count', Period2022-11_M | trans_count', Period2022-12_M | age | job | city | age_group | Target | gender_F | gender_M | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 775 | 804015362037 | 0.00 | 3123.45 | 3105.97 | 5041.83 | 4733.73 | 5936.18 | 6003.54 | 6051.26 | 6367.30 | 4736.08 | 5356.48 | 3346.69 | 7675.86 | 6310.53 | 1377.63 | 3287.53 | 2139.62 | 3903.18 | 3003.03 | 4367.25 | 5236.80 | 6016.03 | 3291.25 | 4962.00 | 8367.57 | 2246.67 | 2179.63 | 5037.71 | 4771.50 | 4452.99 | 4557.49 | 3782.78 | 4488.08 | 6007.86 | 4082.10 | 6886.50 | 8889.13 | 3602.92 | 1668.74 | 2884.05 | 2848.66 | 2693.22 | 10212.05 | 2734.09 | 3644.57 | 2627.50 | 3793.83 | 2611.68 | 0 | 51 | 55 | 90 | 85 | 98 | 100 | 103 | 114 | 82 | 94 | 83 | 174 | 51 | 59 | 93 | 70 | 95 | 95 | 94 | 125 | 81 | 78 | 83 | 163 | 60 | 55 | 86 | 86 | 88 | 87 | 106 | 108 | 69 | 77 | 83 | 174 | 78 | 52 | 73 | 81 | 98 | 87 | 108 | 107 | 73 | 81 | 67 | 180 | 17 | Social research officer, government | Alexandria | <18 | 9043.66 | 0.0 | 1.0 |
| 387 | 393298413208 | 835.23 | 13984.16 | 13072.89 | 22517.81 | 18581.44 | 19171.91 | 24378.44 | 23321.76 | 23096.50 | 16874.02 | 22011.31 | 21538.35 | 36467.39 | 13150.15 | 15694.78 | 19229.05 | 14852.31 | 14146.26 | 10714.28 | 17610.24 | 13843.86 | 11823.63 | 17891.55 | 11184.49 | 28122.69 | 8620.69 | 9220.63 | 11166.28 | 12700.94 | 13819.21 | 13263.25 | 15326.90 | 16453.47 | 12934.47 | 10507.25 | 12008.29 | 24896.22 | 23180.77 | 18767.65 | 30773.95 | 31211.25 | 30385.21 | 29784.23 | 31958.44 | 32422.40 | 22131.49 | 26061.62 | 27641.94 | 6 | 115 | 118 | 182 | 170 | 177 | 216 | 198 | 208 | 162 | 180 | 196 | 296 | 114 | 142 | 158 | 171 | 183 | 174 | 193 | 206 | 162 | 184 | 139 | 357 | 116 | 119 | 163 | 165 | 188 | 203 | 212 | 194 | 173 | 158 | 173 | 340 | 212 | 201 | 335 | 315 | 341 | 394 | 445 | 397 | 304 | 322 | 326 | 688 | 36 | Museum/gallery curator | Martinsville | 35-44 | 50099.15 | 1.0 | 0.0 |
| 875 | 904203673772 | 109.25 | 3874.83 | 3185.31 | 4219.23 | 4778.72 | 7488.23 | 4563.77 | 5527.70 | 6626.63 | 5529.02 | 5035.23 | 6595.74 | 14487.49 | 4720.18 | 4265.60 | 5156.29 | 4247.51 | 4456.70 | 6036.49 | 7903.62 | 6176.79 | 6025.58 | 4364.49 | 4731.12 | 11120.26 | 5251.95 | 2942.86 | 4276.94 | 6309.80 | 6372.30 | 4204.82 | 14146.69 | 6139.59 | 4948.50 | 4585.45 | 4052.23 | 11364.84 | 6114.07 | 9099.57 | 5951.29 | 9278.87 | 9831.60 | 10794.38 | 10221.10 | 19323.26 | 5951.54 | 8416.12 | 6541.68 | 1 | 62 | 56 | 77 | 81 | 74 | 91 | 93 | 111 | 76 | 89 | 90 | 166 | 66 | 59 | 88 | 78 | 80 | 92 | 127 | 103 | 84 | 80 | 87 | 175 | 56 | 59 | 75 | 74 | 84 | 78 | 101 | 113 | 90 | 73 | 74 | 174 | 105 | 104 | 133 | 157 | 149 | 162 | 157 | 177 | 125 | 147 | 137 | 306 | 47 | Associate Professor | Columbus | 45-54 | 16953.57 | 1.0 | 0.0 |
| 167 | 161262210869 | 515.56 | 3907.69 | 4115.90 | 6057.44 | 9166.64 | 8864.49 | 7783.45 | 6226.43 | 7784.48 | 4629.48 | 6979.59 | 5756.26 | 11488.76 | 4907.35 | 3380.40 | 5645.12 | 4390.19 | 6692.77 | 4651.44 | 5565.36 | 6438.19 | 4567.41 | 4067.22 | 7752.15 | 12125.93 | 5030.73 | 4221.51 | 3984.55 | 8278.41 | 7064.68 | 6136.51 | 5711.87 | 5072.22 | 3478.43 | 4534.57 | 5076.26 | 12793.79 | 7403.88 | 5764.70 | 10368.75 | 7900.36 | 9252.29 | 13011.26 | 8008.01 | 9496.38 | 6655.23 | 7850.18 | 7498.56 | 2 | 63 | 50 | 96 | 85 | 102 | 104 | 107 | 115 | 80 | 96 | 91 | 164 | 66 | 69 | 83 | 78 | 103 | 85 | 92 | 107 | 89 | 78 | 99 | 140 | 67 | 55 | 70 | 88 | 77 | 84 | 111 | 99 | 68 | 92 | 101 | 184 | 103 | 88 | 180 | 136 | 136 | 174 | 144 | 156 | 121 | 132 | 139 | 247 | 48 | Sales professional, IT | Sherwood | 45-54 | 12961.80 | 0.0 | 1.0 |
| 548 | 554990175020 | 309.46 | 2250.33 | 4034.83 | 4791.52 | 6821.45 | 5595.22 | 7270.58 | 6715.06 | 7131.95 | 4652.46 | 5181.26 | 3933.96 | 10511.21 | 3574.56 | 5213.22 | 3447.49 | 6180.09 | 5508.11 | 5619.60 | 6550.95 | 7972.92 | 6848.87 | 8116.71 | 4261.93 | 11158.79 | 5294.54 | 4128.81 | 6235.92 | 4698.01 | 6792.97 | 6638.87 | 6533.63 | 5404.85 | 4138.93 | 4173.73 | 5332.28 | 10626.88 | 2582.78 | 3340.69 | 8687.11 | 3793.95 | 4759.63 | 5033.70 | 4758.63 | 5631.71 | 4179.91 | 6404.53 | 6111.89 | 3 | 49 | 61 | 79 | 91 | 95 | 99 | 88 | 91 | 86 | 103 | 81 | 167 | 60 | 61 | 72 | 65 | 93 | 90 | 98 | 123 | 93 | 94 | 70 | 156 | 70 | 60 | 75 | 88 | 93 | 105 | 98 | 102 | 72 | 95 | 89 | 168 | 52 | 65 | 88 | 87 | 96 | 89 | 98 | 102 | 81 | 84 | 98 | 157 | 52 | Acupuncturist | Forks | 45-54 | 8897.78 | 1.0 | 0.0 |
frequency_map = {}
job_frequency = train_df['job'].value_counts(normalize=True)
train_df['job_frequency'] = train_df['job'].map(job_frequency)
val_df['job_frequency'] = val_df['job'].map(job_frequency)
test_df['job_frequency'] = test_df['job'].map(job_frequency)
val_df['job_frequency'].fillna(val_df['job_frequency'].mean(), inplace=True)
test_df['job_frequency'].fillna(val_df['job_frequency'].mean(), inplace=True)
city_frequency = train_df['city'].value_counts(normalize=True)
train_df['city_frequency'] = train_df['city'].map(city_frequency)
val_df['city_frequency'] = val_df['city'].map(city_frequency)
test_df['city_frequency'] = test_df['city'].map(city_frequency)
val_df['city_frequency'].fillna(val_df['city_frequency'].mean(), inplace=True)
test_df['city_frequency'].fillna(val_df['city_frequency'].mean(), inplace=True)
age_group_mapping = {
'<18': 1,
'18-24': 2,
'25-34': 3,
'35-44': 4,
'45-54': 5,
'55-64': 6,
'65+': 7
}
train_df['age_group_encoded'] = train_df['age_group'].map(age_group_mapping)
val_df['age_group_encoded'] = val_df['age_group'].map(age_group_mapping)
test_df['age_group_encoded'] = test_df['age_group'].map(age_group_mapping)
train_df.drop(['age', 'age_group','job', 'city'], axis=1, inplace=True)
val_df.drop(['age', 'age_group', 'job', 'city'], axis=1, inplace=True)
test_df.drop(['age', 'age_group', 'job', 'city'], axis=1, inplace=True)
train_df.head()
| acct_num | total_amt', Period2018-12_M | total_amt', Period2019-01_M | total_amt', Period2019-02_M | total_amt', Period2019-03_M | total_amt', Period2019-04_M | total_amt', Period2019-05_M | total_amt', Period2019-06_M | total_amt', Period2019-07_M | total_amt', Period2019-08_M | total_amt', Period2019-09_M | total_amt', Period2019-10_M | total_amt', Period2019-11_M | total_amt', Period2019-12_M | total_amt', Period2020-01_M | total_amt', Period2020-02_M | total_amt', Period2020-03_M | total_amt', Period2020-04_M | total_amt', Period2020-05_M | total_amt', Period2020-06_M | total_amt', Period2020-07_M | total_amt', Period2020-08_M | total_amt', Period2020-09_M | total_amt', Period2020-10_M | total_amt', Period2020-11_M | total_amt', Period2020-12_M | total_amt', Period2021-01_M | total_amt', Period2021-02_M | total_amt', Period2021-03_M | total_amt', Period2021-04_M | total_amt', Period2021-05_M | total_amt', Period2021-06_M | total_amt', Period2021-07_M | total_amt', Period2021-08_M | total_amt', Period2021-09_M | total_amt', Period2021-10_M | total_amt', Period2021-11_M | total_amt', Period2021-12_M | total_amt', Period2022-01_M | total_amt', Period2022-02_M | total_amt', Period2022-03_M | total_amt', Period2022-04_M | total_amt', Period2022-05_M | total_amt', Period2022-06_M | total_amt', Period2022-07_M | total_amt', Period2022-08_M | total_amt', Period2022-09_M | total_amt', Period2022-10_M | total_amt', Period2022-11_M | trans_count', Period2018-12_M | trans_count', Period2019-01_M | trans_count', Period2019-02_M | trans_count', Period2019-03_M | trans_count', Period2019-04_M | trans_count', Period2019-05_M | trans_count', Period2019-06_M | trans_count', Period2019-07_M | trans_count', Period2019-08_M | trans_count', Period2019-09_M | trans_count', Period2019-10_M | trans_count', Period2019-11_M | trans_count', Period2019-12_M | trans_count', Period2020-01_M | trans_count', Period2020-02_M | trans_count', Period2020-03_M | trans_count', Period2020-04_M | trans_count', Period2020-05_M | trans_count', Period2020-06_M | trans_count', Period2020-07_M | trans_count', Period2020-08_M | trans_count', Period2020-09_M | trans_count', Period2020-10_M | trans_count', Period2020-11_M | trans_count', Period2020-12_M | trans_count', Period2021-01_M | trans_count', Period2021-02_M | trans_count', Period2021-03_M | trans_count', Period2021-04_M | trans_count', Period2021-05_M | trans_count', Period2021-06_M | trans_count', Period2021-07_M | trans_count', Period2021-08_M | trans_count', Period2021-09_M | trans_count', Period2021-10_M | trans_count', Period2021-11_M | trans_count', Period2021-12_M | trans_count', Period2022-01_M | trans_count', Period2022-02_M | trans_count', Period2022-03_M | trans_count', Period2022-04_M | trans_count', Period2022-05_M | trans_count', Period2022-06_M | trans_count', Period2022-07_M | trans_count', Period2022-08_M | trans_count', Period2022-09_M | trans_count', Period2022-10_M | trans_count', Period2022-11_M | trans_count', Period2022-12_M | Target | gender_F | gender_M | job_frequency | city_frequency | age_group_encoded | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 775 | 804015362037 | 0.00 | 3123.45 | 3105.97 | 5041.83 | 4733.73 | 5936.18 | 6003.54 | 6051.26 | 6367.30 | 4736.08 | 5356.48 | 3346.69 | 7675.86 | 6310.53 | 1377.63 | 3287.53 | 2139.62 | 3903.18 | 3003.03 | 4367.25 | 5236.80 | 6016.03 | 3291.25 | 4962.00 | 8367.57 | 2246.67 | 2179.63 | 5037.71 | 4771.50 | 4452.99 | 4557.49 | 3782.78 | 4488.08 | 6007.86 | 4082.10 | 6886.50 | 8889.13 | 3602.92 | 1668.74 | 2884.05 | 2848.66 | 2693.22 | 10212.05 | 2734.09 | 3644.57 | 2627.50 | 3793.83 | 2611.68 | 0 | 51 | 55 | 90 | 85 | 98 | 100 | 103 | 114 | 82 | 94 | 83 | 174 | 51 | 59 | 93 | 70 | 95 | 95 | 94 | 125 | 81 | 78 | 83 | 163 | 60 | 55 | 86 | 86 | 88 | 87 | 106 | 108 | 69 | 77 | 83 | 174 | 78 | 52 | 73 | 81 | 98 | 87 | 108 | 107 | 73 | 81 | 67 | 180 | 9043.66 | 0.0 | 1.0 | 0.005495 | 0.005495 | 1 |
| 387 | 393298413208 | 835.23 | 13984.16 | 13072.89 | 22517.81 | 18581.44 | 19171.91 | 24378.44 | 23321.76 | 23096.50 | 16874.02 | 22011.31 | 21538.35 | 36467.39 | 13150.15 | 15694.78 | 19229.05 | 14852.31 | 14146.26 | 10714.28 | 17610.24 | 13843.86 | 11823.63 | 17891.55 | 11184.49 | 28122.69 | 8620.69 | 9220.63 | 11166.28 | 12700.94 | 13819.21 | 13263.25 | 15326.90 | 16453.47 | 12934.47 | 10507.25 | 12008.29 | 24896.22 | 23180.77 | 18767.65 | 30773.95 | 31211.25 | 30385.21 | 29784.23 | 31958.44 | 32422.40 | 22131.49 | 26061.62 | 27641.94 | 6 | 115 | 118 | 182 | 170 | 177 | 216 | 198 | 208 | 162 | 180 | 196 | 296 | 114 | 142 | 158 | 171 | 183 | 174 | 193 | 206 | 162 | 184 | 139 | 357 | 116 | 119 | 163 | 165 | 188 | 203 | 212 | 194 | 173 | 158 | 173 | 340 | 212 | 201 | 335 | 315 | 341 | 394 | 445 | 397 | 304 | 322 | 326 | 688 | 50099.15 | 1.0 | 0.0 | 0.001832 | 0.001832 | 4 |
| 875 | 904203673772 | 109.25 | 3874.83 | 3185.31 | 4219.23 | 4778.72 | 7488.23 | 4563.77 | 5527.70 | 6626.63 | 5529.02 | 5035.23 | 6595.74 | 14487.49 | 4720.18 | 4265.60 | 5156.29 | 4247.51 | 4456.70 | 6036.49 | 7903.62 | 6176.79 | 6025.58 | 4364.49 | 4731.12 | 11120.26 | 5251.95 | 2942.86 | 4276.94 | 6309.80 | 6372.30 | 4204.82 | 14146.69 | 6139.59 | 4948.50 | 4585.45 | 4052.23 | 11364.84 | 6114.07 | 9099.57 | 5951.29 | 9278.87 | 9831.60 | 10794.38 | 10221.10 | 19323.26 | 5951.54 | 8416.12 | 6541.68 | 1 | 62 | 56 | 77 | 81 | 74 | 91 | 93 | 111 | 76 | 89 | 90 | 166 | 66 | 59 | 88 | 78 | 80 | 92 | 127 | 103 | 84 | 80 | 87 | 175 | 56 | 59 | 75 | 74 | 84 | 78 | 101 | 113 | 90 | 73 | 74 | 174 | 105 | 104 | 133 | 157 | 149 | 162 | 157 | 177 | 125 | 147 | 137 | 306 | 16953.57 | 1.0 | 0.0 | 0.005495 | 0.001832 | 5 |
| 167 | 161262210869 | 515.56 | 3907.69 | 4115.90 | 6057.44 | 9166.64 | 8864.49 | 7783.45 | 6226.43 | 7784.48 | 4629.48 | 6979.59 | 5756.26 | 11488.76 | 4907.35 | 3380.40 | 5645.12 | 4390.19 | 6692.77 | 4651.44 | 5565.36 | 6438.19 | 4567.41 | 4067.22 | 7752.15 | 12125.93 | 5030.73 | 4221.51 | 3984.55 | 8278.41 | 7064.68 | 6136.51 | 5711.87 | 5072.22 | 3478.43 | 4534.57 | 5076.26 | 12793.79 | 7403.88 | 5764.70 | 10368.75 | 7900.36 | 9252.29 | 13011.26 | 8008.01 | 9496.38 | 6655.23 | 7850.18 | 7498.56 | 2 | 63 | 50 | 96 | 85 | 102 | 104 | 107 | 115 | 80 | 96 | 91 | 164 | 66 | 69 | 83 | 78 | 103 | 85 | 92 | 107 | 89 | 78 | 99 | 140 | 67 | 55 | 70 | 88 | 77 | 84 | 111 | 99 | 68 | 92 | 101 | 184 | 103 | 88 | 180 | 136 | 136 | 174 | 144 | 156 | 121 | 132 | 139 | 247 | 12961.80 | 0.0 | 1.0 | 0.001832 | 0.001832 | 5 |
| 548 | 554990175020 | 309.46 | 2250.33 | 4034.83 | 4791.52 | 6821.45 | 5595.22 | 7270.58 | 6715.06 | 7131.95 | 4652.46 | 5181.26 | 3933.96 | 10511.21 | 3574.56 | 5213.22 | 3447.49 | 6180.09 | 5508.11 | 5619.60 | 6550.95 | 7972.92 | 6848.87 | 8116.71 | 4261.93 | 11158.79 | 5294.54 | 4128.81 | 6235.92 | 4698.01 | 6792.97 | 6638.87 | 6533.63 | 5404.85 | 4138.93 | 4173.73 | 5332.28 | 10626.88 | 2582.78 | 3340.69 | 8687.11 | 3793.95 | 4759.63 | 5033.70 | 4758.63 | 5631.71 | 4179.91 | 6404.53 | 6111.89 | 3 | 49 | 61 | 79 | 91 | 95 | 99 | 88 | 91 | 86 | 103 | 81 | 167 | 60 | 61 | 72 | 65 | 93 | 90 | 98 | 123 | 93 | 94 | 70 | 156 | 70 | 60 | 75 | 88 | 93 | 105 | 98 | 102 | 72 | 95 | 89 | 168 | 52 | 65 | 88 | 87 | 96 | 89 | 98 | 102 | 81 | 84 | 98 | 157 | 8897.78 | 1.0 | 0.0 | 0.007326 | 0.001832 | 5 |
train_df.isna().sum()
acct_num 0 total_amt', Period2018-12_M 0 total_amt', Period2019-01_M 0 total_amt', Period2019-02_M 0 total_amt', Period2019-03_M 0 total_amt', Period2019-04_M 0 total_amt', Period2019-05_M 0 total_amt', Period2019-06_M 0 total_amt', Period2019-07_M 0 total_amt', Period2019-08_M 0 total_amt', Period2019-09_M 0 total_amt', Period2019-10_M 0 total_amt', Period2019-11_M 0 total_amt', Period2019-12_M 0 total_amt', Period2020-01_M 0 total_amt', Period2020-02_M 0 total_amt', Period2020-03_M 0 total_amt', Period2020-04_M 0 total_amt', Period2020-05_M 0 total_amt', Period2020-06_M 0 total_amt', Period2020-07_M 0 total_amt', Period2020-08_M 0 total_amt', Period2020-09_M 0 total_amt', Period2020-10_M 0 total_amt', Period2020-11_M 0 total_amt', Period2020-12_M 0 total_amt', Period2021-01_M 0 total_amt', Period2021-02_M 0 total_amt', Period2021-03_M 0 total_amt', Period2021-04_M 0 total_amt', Period2021-05_M 0 total_amt', Period2021-06_M 0 total_amt', Period2021-07_M 0 total_amt', Period2021-08_M 0 total_amt', Period2021-09_M 0 total_amt', Period2021-10_M 0 total_amt', Period2021-11_M 0 total_amt', Period2021-12_M 0 total_amt', Period2022-01_M 0 total_amt', Period2022-02_M 0 total_amt', Period2022-03_M 0 total_amt', Period2022-04_M 0 total_amt', Period2022-05_M 0 total_amt', Period2022-06_M 0 total_amt', Period2022-07_M 0 total_amt', Period2022-08_M 0 total_amt', Period2022-09_M 0 total_amt', Period2022-10_M 0 total_amt', Period2022-11_M 0 trans_count', Period2018-12_M 0 trans_count', Period2019-01_M 0 trans_count', Period2019-02_M 0 trans_count', Period2019-03_M 0 trans_count', Period2019-04_M 0 trans_count', Period2019-05_M 0 trans_count', Period2019-06_M 0 trans_count', Period2019-07_M 0 trans_count', Period2019-08_M 0 trans_count', Period2019-09_M 0 trans_count', Period2019-10_M 0 trans_count', Period2019-11_M 0 trans_count', Period2019-12_M 0 trans_count', Period2020-01_M 0 trans_count', Period2020-02_M 0 trans_count', Period2020-03_M 0 trans_count', Period2020-04_M 0 trans_count', Period2020-05_M 0 trans_count', Period2020-06_M 0 trans_count', Period2020-07_M 0 trans_count', Period2020-08_M 0 trans_count', Period2020-09_M 0 trans_count', Period2020-10_M 0 trans_count', Period2020-11_M 0 trans_count', Period2020-12_M 0 trans_count', Period2021-01_M 0 trans_count', Period2021-02_M 0 trans_count', Period2021-03_M 0 trans_count', Period2021-04_M 0 trans_count', Period2021-05_M 0 trans_count', Period2021-06_M 0 trans_count', Period2021-07_M 0 trans_count', Period2021-08_M 0 trans_count', Period2021-09_M 0 trans_count', Period2021-10_M 0 trans_count', Period2021-11_M 0 trans_count', Period2021-12_M 0 trans_count', Period2022-01_M 0 trans_count', Period2022-02_M 0 trans_count', Period2022-03_M 0 trans_count', Period2022-04_M 0 trans_count', Period2022-05_M 0 trans_count', Period2022-06_M 0 trans_count', Period2022-07_M 0 trans_count', Period2022-08_M 0 trans_count', Period2022-09_M 0 trans_count', Period2022-10_M 0 trans_count', Period2022-11_M 0 trans_count', Period2022-12_M 0 Target 0 gender_F 0 gender_M 0 job_frequency 0 city_frequency 0 age_group_encoded 0 dtype: int64
val_df.isna().sum()
acct_num 0 total_amt', Period2018-12_M 0 total_amt', Period2019-01_M 0 total_amt', Period2019-02_M 0 total_amt', Period2019-03_M 0 total_amt', Period2019-04_M 0 total_amt', Period2019-05_M 0 total_amt', Period2019-06_M 0 total_amt', Period2019-07_M 0 total_amt', Period2019-08_M 0 total_amt', Period2019-09_M 0 total_amt', Period2019-10_M 0 total_amt', Period2019-11_M 0 total_amt', Period2019-12_M 0 total_amt', Period2020-01_M 0 total_amt', Period2020-02_M 0 total_amt', Period2020-03_M 0 total_amt', Period2020-04_M 0 total_amt', Period2020-05_M 0 total_amt', Period2020-06_M 0 total_amt', Period2020-07_M 0 total_amt', Period2020-08_M 0 total_amt', Period2020-09_M 0 total_amt', Period2020-10_M 0 total_amt', Period2020-11_M 0 total_amt', Period2020-12_M 0 total_amt', Period2021-01_M 0 total_amt', Period2021-02_M 0 total_amt', Period2021-03_M 0 total_amt', Period2021-04_M 0 total_amt', Period2021-05_M 0 total_amt', Period2021-06_M 0 total_amt', Period2021-07_M 0 total_amt', Period2021-08_M 0 total_amt', Period2021-09_M 0 total_amt', Period2021-10_M 0 total_amt', Period2021-11_M 0 total_amt', Period2021-12_M 0 total_amt', Period2022-01_M 0 total_amt', Period2022-02_M 0 total_amt', Period2022-03_M 0 total_amt', Period2022-04_M 0 total_amt', Period2022-05_M 0 total_amt', Period2022-06_M 0 total_amt', Period2022-07_M 0 total_amt', Period2022-08_M 0 total_amt', Period2022-09_M 0 total_amt', Period2022-10_M 0 total_amt', Period2022-11_M 0 trans_count', Period2018-12_M 0 trans_count', Period2019-01_M 0 trans_count', Period2019-02_M 0 trans_count', Period2019-03_M 0 trans_count', Period2019-04_M 0 trans_count', Period2019-05_M 0 trans_count', Period2019-06_M 0 trans_count', Period2019-07_M 0 trans_count', Period2019-08_M 0 trans_count', Period2019-09_M 0 trans_count', Period2019-10_M 0 trans_count', Period2019-11_M 0 trans_count', Period2019-12_M 0 trans_count', Period2020-01_M 0 trans_count', Period2020-02_M 0 trans_count', Period2020-03_M 0 trans_count', Period2020-04_M 0 trans_count', Period2020-05_M 0 trans_count', Period2020-06_M 0 trans_count', Period2020-07_M 0 trans_count', Period2020-08_M 0 trans_count', Period2020-09_M 0 trans_count', Period2020-10_M 0 trans_count', Period2020-11_M 0 trans_count', Period2020-12_M 0 trans_count', Period2021-01_M 0 trans_count', Period2021-02_M 0 trans_count', Period2021-03_M 0 trans_count', Period2021-04_M 0 trans_count', Period2021-05_M 0 trans_count', Period2021-06_M 0 trans_count', Period2021-07_M 0 trans_count', Period2021-08_M 0 trans_count', Period2021-09_M 0 trans_count', Period2021-10_M 0 trans_count', Period2021-11_M 0 trans_count', Period2021-12_M 0 trans_count', Period2022-01_M 0 trans_count', Period2022-02_M 0 trans_count', Period2022-03_M 0 trans_count', Period2022-04_M 0 trans_count', Period2022-05_M 0 trans_count', Period2022-06_M 0 trans_count', Period2022-07_M 0 trans_count', Period2022-08_M 0 trans_count', Period2022-09_M 0 trans_count', Period2022-10_M 0 trans_count', Period2022-11_M 0 trans_count', Period2022-12_M 0 Target 0 gender_F 0 gender_M 0 job_frequency 0 city_frequency 0 age_group_encoded 0 dtype: int64
test_df.isna().sum()
acct_num 0 total_amt', Period2018-12_M 0 total_amt', Period2019-01_M 0 total_amt', Period2019-02_M 0 total_amt', Period2019-03_M 0 total_amt', Period2019-04_M 0 total_amt', Period2019-05_M 0 total_amt', Period2019-06_M 0 total_amt', Period2019-07_M 0 total_amt', Period2019-08_M 0 total_amt', Period2019-09_M 0 total_amt', Period2019-10_M 0 total_amt', Period2019-11_M 0 total_amt', Period2019-12_M 0 total_amt', Period2020-01_M 0 total_amt', Period2020-02_M 0 total_amt', Period2020-03_M 0 total_amt', Period2020-04_M 0 total_amt', Period2020-05_M 0 total_amt', Period2020-06_M 0 total_amt', Period2020-07_M 0 total_amt', Period2020-08_M 0 total_amt', Period2020-09_M 0 total_amt', Period2020-10_M 0 total_amt', Period2020-11_M 0 total_amt', Period2020-12_M 0 total_amt', Period2021-01_M 0 total_amt', Period2021-02_M 0 total_amt', Period2021-03_M 0 total_amt', Period2021-04_M 0 total_amt', Period2021-05_M 0 total_amt', Period2021-06_M 0 total_amt', Period2021-07_M 0 total_amt', Period2021-08_M 0 total_amt', Period2021-09_M 0 total_amt', Period2021-10_M 0 total_amt', Period2021-11_M 0 total_amt', Period2021-12_M 0 total_amt', Period2022-01_M 0 total_amt', Period2022-02_M 0 total_amt', Period2022-03_M 0 total_amt', Period2022-04_M 0 total_amt', Period2022-05_M 0 total_amt', Period2022-06_M 0 total_amt', Period2022-07_M 0 total_amt', Period2022-08_M 0 total_amt', Period2022-09_M 0 total_amt', Period2022-10_M 0 total_amt', Period2022-11_M 0 trans_count', Period2018-12_M 0 trans_count', Period2019-01_M 0 trans_count', Period2019-02_M 0 trans_count', Period2019-03_M 0 trans_count', Period2019-04_M 0 trans_count', Period2019-05_M 0 trans_count', Period2019-06_M 0 trans_count', Period2019-07_M 0 trans_count', Period2019-08_M 0 trans_count', Period2019-09_M 0 trans_count', Period2019-10_M 0 trans_count', Period2019-11_M 0 trans_count', Period2019-12_M 0 trans_count', Period2020-01_M 0 trans_count', Period2020-02_M 0 trans_count', Period2020-03_M 0 trans_count', Period2020-04_M 0 trans_count', Period2020-05_M 0 trans_count', Period2020-06_M 0 trans_count', Period2020-07_M 0 trans_count', Period2020-08_M 0 trans_count', Period2020-09_M 0 trans_count', Period2020-10_M 0 trans_count', Period2020-11_M 0 trans_count', Period2020-12_M 0 trans_count', Period2021-01_M 0 trans_count', Period2021-02_M 0 trans_count', Period2021-03_M 0 trans_count', Period2021-04_M 0 trans_count', Period2021-05_M 0 trans_count', Period2021-06_M 0 trans_count', Period2021-07_M 0 trans_count', Period2021-08_M 0 trans_count', Period2021-09_M 0 trans_count', Period2021-10_M 0 trans_count', Period2021-11_M 0 trans_count', Period2021-12_M 0 trans_count', Period2022-01_M 0 trans_count', Period2022-02_M 0 trans_count', Period2022-03_M 0 trans_count', Period2022-04_M 0 trans_count', Period2022-05_M 0 trans_count', Period2022-06_M 0 trans_count', Period2022-07_M 0 trans_count', Period2022-08_M 0 trans_count', Period2022-09_M 0 trans_count', Period2022-10_M 0 trans_count', Period2022-11_M 0 trans_count', Period2022-12_M 0 Target 0 gender_F 0 gender_M 0 job_frequency 0 city_frequency 0 age_group_encoded 0 dtype: int64
X_train = train_df.drop(["acct_num", "Target"], axis=1)
y_train = train_df["Target"]
X_val = val_df.drop(["acct_num", "Target"], axis=1)
y_val = val_df["Target"]
X_test = test_df.drop(["acct_num", "Target"], axis=1)
y_test = test_df["Target"]
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)
X_train = pd.DataFrame(X_train_scaled, columns=X_train.columns)
X_val = pd.DataFrame(X_val_scaled, columns=X_val.columns)
X_test = pd.DataFrame(X_test_scaled, columns=X_test.columns)
In the regression analysis, we developed and evaluated seven models to predict the next month’s spending. These models include:
Baseline: This model serves as a benchmark and is generated using the mean of the predictor variable. It provides a reference point for evaluating the performance of other models. Multivariate Regression: A linear regression model that considers multiple predictor variables. Lasso Regression: A linear regression model that applies L1 regularization to the model coefficients, promoting sparsity and feature selection. Ridge Regression: A linear regression model that applies L2 regularization to the model coefficients, helping to reduce overfitting. ElasticNet Regression: A linear regression model that combines L1 and L2 regularization techniques, offering a balance between feature selection and model complexity control. Decision Tree Regressor: A non-linear regression model that uses decision tree algorithms. Random Forest Regressor: An ensemble learning method that combines multiple decision trees. Gradient Boosting Regressor: Another ensemble learning method that sequentially trains weak learners (decision trees) to improve model performance.
Before splitting the data, we performed standard scaling on the features to normalize their values and ensure they were on a similar scale. This step was crucial to prevent any particular feature from dominating the model training process. Additionally, we applied one-hot encoding to the ‘gender’ variable and frequency-encoding to the ‘job’ and ‘city’ columns. The ‘age_group’ variable was mapped to an ordinal type ranging from 1 to 7.
We split the data into training, validation, and testing sets to ensure a fair evaluation of the models. The data was divided so that 20% was reserved for testing, while the remaining 80% was split into training (80%) and validation (20%) sets. The performance of each model was evaluated using metrics such as mean absolute error (MAE), mean squared error (MSE), and root mean square error (RMSE) on both the training and validation sets.
Feature selection techniques, including feature engineering, were applied to all models to identify the most relevant predictors (see more details in the appendix). For the ensemble techniques (Decision Tree Regressor, Random Forest Regressor, and Gradient Boosting Regressor), hyperparameter tuning was performed to optimize their performance.
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
y_mean = y_train.mean()
y_base = np.full(y_train.shape, y_mean)
mae = mean_absolute_error(y_train, y_base)
mse = mean_squared_error(y_train, y_base)
rmse = mean_squared_error(y_train, y_base, squared=False)
baseline_train = pd.DataFrame({'mae':mae,
'mse':mse,
'rmse':rmse}, index=['Baseline_Train'])
baseline_train
| mae | mse | rmse | |
|---|---|---|---|
| Baseline_Train | 8307.422361 | 1.105862e+08 | 10515.995863 |
y_mean = y_test.mean()
y_base = np.full(y_test.shape, y_mean)
mae = mean_absolute_error(y_base, y_test)
mse = mean_squared_error(y_base, y_test)
rmse = mean_squared_error(y_base, y_test, squared=False)
baseline_test = pd.DataFrame({'mae':mae,
'mse':mse,
'rmse':rmse}, index=['Baseline_Test'])
baseline_test
| mae | mse | rmse | |
|---|---|---|---|
| Baseline_Test | 7751.203825 | 1.000556e+08 | 10002.777477 |
def multilinear(X_train, y_train, X_test, y_test, index_train, index_test):
reg = LinearRegression()
reg.fit(X_train, y_train)
y_preds_train = reg.predict(X_train)
y_preds_test = reg.predict(X_test)
mse = mean_squared_error(y_train, y_preds_train)
mae = mean_absolute_error(y_train, y_preds_train)
rmse = mean_squared_error(y_train, y_preds_train, squared=False)
multi_train = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_train])
mse = mean_squared_error(y_test, y_preds_test)
mae = mean_absolute_error(y_test, y_preds_test)
rmse = mean_squared_error(y_test, y_preds_test, squared=False)
multi_test = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_test])
multi_models = pd.concat([multi_train, multi_test])
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16,6))
# Plot the predicted vs actual target values for the training set
axes[0].plot(y_train, y_preds_train, 'o', color='orange', label='Predictions')
axes[0].plot(y_train, y_train, '-', color='red', label='Actual')
axes[0].set_xlabel('Actual')
axes[0].set_ylabel('Predicted')
axes[0].set_title(f'{index_train}: Comparison of Actual vs. Predicted Target')
axes[0].legend()
axes[1].plot(y_test, y_preds_test, 'o', color='orange', label='Predictions')
axes[1].plot(y_test, y_test, '-', color='red', label='Actual')
axes[1].set_xlabel('Actual')
axes[1].set_ylabel('Predicted')
axes[1].set_title(f'{index_test}: Comparison of Actual vs. Predicted Target')
axes[1].legend()
return multi_models
multi = multilinear(X_train, y_train, X_val, y_val, 'MultiLinear_Train', 'MultiLinear_Val')
multi
| mae | mse | rmse | |
|---|---|---|---|
| MultiLinear_Train | 2079.423872 | 9.298125e+06 | 3049.282748 |
| MultiLinear_Val | 2267.698408 | 1.029542e+07 | 3208.647385 |
correlation = {}
for column in trans_df.columns:
if column not in ['acct_num', 'Target']:
if trans_df[column].dtype in [np.int64, np.float64]:
correlation[column] = np.abs(round(trans_df['Target'].corr(trans_df[column]), 2))
sorted_correlation = dict(sorted(correlation.items(), key=lambda x:x[1], reverse=True))
feature_var = []
for key in sorted_correlation.keys():
if sorted_correlation[key] > 0.5:
feature_var.append(key)
X_train_feature1 = train_df[feature_var]
X_val_feature1 = val_df[feature_var]
X_test_feature1 = test_df[feature_var]
X_train_feature1_scaled = scaler.fit_transform(X_train_feature1)
X_val_feature1_scaled = scaler.transform(X_val_feature1)
X_test_feature1_scaled = scaler.transform(X_test_feature1)
X_train_feature1 = pd.DataFrame(X_train_feature1_scaled, columns=X_train_feature1.columns)
X_val_feature1 = pd.DataFrame(X_val_feature1_scaled, columns=X_val_feature1.columns)
X_test_feature1 = pd.DataFrame(X_test_feature1_scaled, columns=X_test_feature1.columns)
multi_feature1 = multilinear(X_train_feature1, y_train, X_val_feature1, y_val, 'MultiLinear_Feature1_Train', 'MultiLinear_Feature1_Val')
multi_feature1
| mae | mse | rmse | |
|---|---|---|---|
| MultiLinear_Feature1_Train | 2070.482628 | 9.344101e+06 | 3056.812263 |
| MultiLinear_Feature1_Val | 2230.827601 | 1.010146e+07 | 3178.280116 |
feature_var = []
for key in sorted_correlation.keys():
if sorted_correlation[key] > 0.7:
feature_var.append(key)
X_train_feature2 = train_df[feature_var]
X_val_feature2 = val_df[feature_var]
X_test_feature2 = test_df[feature_var]
X_train_feature2_scaled = scaler.fit_transform(X_train_feature2)
X_val_feature2_scaled = scaler.transform(X_val_feature2)
X_test_feature2_scaled = scaler.transform(X_test_feature2)
X_train_feature2 = pd.DataFrame(X_train_feature2_scaled, columns=X_train_feature2.columns)
X_val_feature2 = pd.DataFrame(X_val_feature2_scaled, columns=X_val_feature2.columns)
X_test_feature2 = pd.DataFrame(X_test_feature2_scaled, columns=X_test_feature2.columns)
multi_feature2 = multilinear(X_train_feature2, y_train, X_val_feature2, y_val, 'MultiLinear_Feature2_Train', 'MultiLinear_Feature2_Val')
multi_feature2
| mae | mse | rmse | |
|---|---|---|---|
| MultiLinear_Feature2_Train | 2093.693198 | 1.030223e+07 | 3209.708360 |
| MultiLinear_Feature2_Val | 2123.415974 | 9.319224e+06 | 3052.740411 |
feature_var = []
for key in sorted_correlation.keys():
if sorted_correlation[key] > 0.8:
feature_var.append(key)
X_train_feature3 = train_df[feature_var]
X_val_feature3 = val_df[feature_var]
X_test_feature3 = test_df[feature_var]
X_train_feature3_scaled = scaler.fit_transform(X_train_feature3)
X_val_feature3_scaled = scaler.transform(X_val_feature3)
X_test_feature3_scaled = scaler.transform(X_test_feature3)
X_train_feature1 = pd.DataFrame(X_train_feature3_scaled, columns=X_train_feature3.columns)
X_val_feature1 = pd.DataFrame(X_val_feature3_scaled, columns=X_val_feature3.columns)
X_test_feature1 = pd.DataFrame(X_test_feature3_scaled, columns=X_test_feature3.columns)
multi_feature3 = multilinear(X_train_feature3, y_train, X_val_feature3, y_val, 'MultiLinear_Feature3_Train', 'MultiLinear_Feature3_Val')
multi_feature3
| mae | mse | rmse | |
|---|---|---|---|
| MultiLinear_Feature3_Train | 2278.747762 | 1.223434e+07 | 3497.762131 |
| MultiLinear_Feature3_Val | 2124.895843 | 9.739218e+06 | 3120.772055 |
multi_model = pd.concat([baseline_train, baseline_test, multi, multi_feature1, multi_feature2, multi_feature3])
multi_model
| mae | mse | rmse | |
|---|---|---|---|
| Baseline_Train | 8307.422361 | 1.105862e+08 | 10515.995863 |
| Baseline_Test | 7751.203825 | 1.000556e+08 | 10002.777477 |
| MultiLinear_Train | 2079.423872 | 9.298125e+06 | 3049.282748 |
| MultiLinear_Val | 2267.698408 | 1.029542e+07 | 3208.647385 |
| MultiLinear_Feature1_Train | 2070.482628 | 9.344101e+06 | 3056.812263 |
| MultiLinear_Feature1_Val | 2230.827601 | 1.010146e+07 | 3178.280116 |
| MultiLinear_Feature2_Train | 2093.693198 | 1.030223e+07 | 3209.708360 |
| MultiLinear_Feature2_Val | 2123.415974 | 9.319224e+06 | 3052.740411 |
| MultiLinear_Feature3_Train | 2278.747762 | 1.223434e+07 | 3497.762131 |
| MultiLinear_Feature3_Val | 2124.895843 | 9.739218e+06 | 3120.772055 |
from sklearn.linear_model import Lasso
def lassomodel(X_train, y_train, X_test, y_test, index_train, index_test):
lasso_reg = Lasso()
lasso_reg.fit(X_train, y_train)
y_preds_train = lasso_reg.predict(X_train)
y_preds_test = lasso_reg.predict(X_test)
mse = mean_squared_error(y_train, y_preds_train)
mae = mean_absolute_error(y_train, y_preds_train)
rmse = mean_squared_error(y_train, y_preds_train, squared=False)
lasso_train = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_train])
mse = mean_squared_error(y_test, y_preds_test)
mae = mean_absolute_error(y_test, y_preds_test)
rmse = mean_squared_error(y_test, y_preds_test, squared=False)
lasso_test = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_test])
lasso_models = pd.concat([lasso_train, lasso_test])
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 6))
# Plot the predicted vs actual target values for the training set
axes[0].plot(y_train, y_preds_train, 'o',
color='orange', label='Predictions')
axes[0].plot(y_train, y_train, '-', color='red', label='Actual')
axes[0].set_xlabel('Actual')
axes[0].set_ylabel('Predicted')
axes[0].set_title(
f'{index_train}: Comparison of Actual vs. Predicted Target')
axes[0].legend()
axes[1].plot(y_test, y_preds_test, 'o',
color='orange', label='Predictions')
axes[1].plot(y_test, y_test, '-', color='red', label='Actual')
axes[1].set_xlabel('Actual')
axes[1].set_ylabel('Predicted')
axes[1].set_title(
f'{index_test}: Comparison of Actual vs. Predicted Target')
axes[1].legend()
plt.show()
return lasso_models
lasso = lassomodel(X_train, y_train, X_val, y_val, 'Lasso_Train', 'Lasso_Val')
lasso
/opt/homebrew/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:631: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 2.229e+09, tolerance: 6.038e+06 model = cd_fast.enet_coordinate_descent(
| mae | mse | rmse | |
|---|---|---|---|
| Lasso_Train | 2093.725380 | 9.353681e+06 | 3058.378796 |
| Lasso_Val | 2279.843057 | 1.024395e+07 | 3200.616371 |
lasso_feature1 = lassomodel(X_train_feature1, y_train, X_val_feature1, y_val, 'Lasso_FEATURE1_Train', 'Lasso_FEATURE1_Val')
lasso_feature1
/opt/homebrew/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:631: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.766e+09, tolerance: 6.038e+06 model = cd_fast.enet_coordinate_descent(
| mae | mse | rmse | |
|---|---|---|---|
| Lasso_FEATURE1_Train | 2278.114357 | 1.223520e+07 | 3497.885577 |
| Lasso_FEATURE1_Val | 2123.981346 | 9.719629e+06 | 3117.631893 |
lasso_feature2 = lassomodel(X_train_feature2, y_train, X_val_feature2, y_val, 'Lasso_FEATURE2_Train', 'Lasso_FEATURE2_Val')
lasso_feature2
/opt/homebrew/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:631: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.137e+09, tolerance: 6.038e+06 model = cd_fast.enet_coordinate_descent(
| mae | mse | rmse | |
|---|---|---|---|
| Lasso_FEATURE2_Train | 2092.325287 | 1.030345e+07 | 3209.898411 |
| Lasso_FEATURE2_Val | 2117.264295 | 9.286441e+06 | 3047.366207 |
lasso_feature3 = lassomodel(X_train_feature3, y_train, X_val_feature3, y_val, 'Lasso_FEATURE3_Train', 'Lasso_FEATURE3_Val')
lasso_feature3
/opt/homebrew/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:631: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.340e+09, tolerance: 6.038e+06 model = cd_fast.enet_coordinate_descent(
| mae | mse | rmse | |
|---|---|---|---|
| Lasso_FEATURE3_Train | 2279.940306 | 1.223561e+07 | 3497.943503 |
| Lasso_FEATURE3_Val | 2127.655356 | 9.759305e+06 | 3123.988600 |
lasso_model = pd.concat([lasso, lasso_feature1, lasso_feature2, lasso_feature3])
lasso_model
| mae | mse | rmse | |
|---|---|---|---|
| Lasso_Train | 2093.725380 | 9.353681e+06 | 3058.378796 |
| Lasso_Val | 2279.843057 | 1.024395e+07 | 3200.616371 |
| Lasso_FEATURE1_Train | 2278.114357 | 1.223520e+07 | 3497.885577 |
| Lasso_FEATURE1_Val | 2123.981346 | 9.719629e+06 | 3117.631893 |
| Lasso_FEATURE2_Train | 2092.325287 | 1.030345e+07 | 3209.898411 |
| Lasso_FEATURE2_Val | 2117.264295 | 9.286441e+06 | 3047.366207 |
| Lasso_FEATURE3_Train | 2279.940306 | 1.223561e+07 | 3497.943503 |
| Lasso_FEATURE3_Val | 2127.655356 | 9.759305e+06 | 3123.988600 |
from sklearn.linear_model import Ridge
def ridgemodel(X_train, y_train, X_test, y_test, index_train, index_test):
ridge_reg = Ridge()
ridge_reg.fit(X_train, y_train)
y_preds_train = ridge_reg.predict(X_train)
y_preds_test = ridge_reg.predict(X_test)
mse = mean_squared_error(y_train, y_preds_train)
mae = mean_absolute_error(y_train, y_preds_train)
rmse = mean_squared_error(y_train, y_preds_train, squared=False)
ridge_train = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_train])
mse = mean_squared_error(y_test, y_preds_test)
mae = mean_absolute_error(y_test, y_preds_test)
rmse = mean_squared_error(y_test, y_preds_test, squared=False)
ridge_test = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_test])
ridge_models = pd.concat([ridge_train, ridge_test])
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 6))
# Plot the predicted vs actual target values for the training set
axes[0].plot(y_train, y_preds_train, 'o',
color='orange', label='Predictions')
axes[0].plot(y_train, y_train, '-', color='red', label='Actual')
axes[0].set_xlabel('Actual')
axes[0].set_ylabel('Predicted')
axes[0].set_title(
f'{index_train}: Comparison of Actual vs. Predicted Target')
axes[0].legend()
axes[1].plot(y_test, y_preds_test, 'o',
color='orange', label='Predictions')
axes[1].plot(y_test, y_test, '-', color='red', label='Actual')
axes[1].set_xlabel('Actual')
axes[1].set_ylabel('Predicted')
axes[1].set_title(
f'{index_test}: Comparison of Actual vs. Predicted Target')
axes[1].legend()
plt.show()
return ridge_models
ridge = ridgemodel(X_train, y_train, X_val, y_val, 'Ridge_Train', 'Ridge_Val')
ridge
| mae | mse | rmse | |
|---|---|---|---|
| Ridge_Train | 2061.966576 | 9.326546e+06 | 3053.939442 |
| Ridge_Val | 2211.667197 | 9.930921e+06 | 3151.336341 |
ridge_feature1 = ridgemodel(X_train_feature1, y_train, X_val_feature1, y_val, 'Ridge_FEATURE1_Train', 'Ridge_FEATURE1_Val')
ridge_feature1
| mae | mse | rmse | |
|---|---|---|---|
| Ridge_FEATURE1_Train | 2274.841596 | 1.225304e+07 | 3500.433978 |
| Ridge_FEATURE1_Val | 2108.951082 | 9.600463e+06 | 3098.461357 |
ridge_feature2 = ridgemodel(X_train_feature2, y_train, X_val_feature2, y_val, 'Ridge_FEATURE2_Train', 'Ridge_FEATURE2_Val')
ridge_feature2
| mae | mse | rmse | |
|---|---|---|---|
| Ridge_FEATURE2_Train | 2084.101832 | 1.031979e+07 | 3212.443623 |
| Ridge_FEATURE2_Val | 2102.691899 | 9.118768e+06 | 3019.729731 |
ridge_feature3 = ridgemodel(X_train_feature3, y_train, X_val_feature3, y_val, 'Ridge_FEATURE3_Train', 'Ridge_FEATURE3_Val')
ridge_feature3
| mae | mse | rmse | |
|---|---|---|---|
| Ridge_FEATURE3_Train | 2278.747159 | 1.223434e+07 | 3497.762131 |
| Ridge_FEATURE3_Val | 2124.893956 | 9.739202e+06 | 3120.769408 |
ridge_model = pd.concat([ridge, ridge_feature1, ridge_feature2, ridge_feature3])
ridge_model
| mae | mse | rmse | |
|---|---|---|---|
| Ridge_Train | 2061.966576 | 9.326546e+06 | 3053.939442 |
| Ridge_Val | 2211.667197 | 9.930921e+06 | 3151.336341 |
| Ridge_FEATURE1_Train | 2274.841596 | 1.225304e+07 | 3500.433978 |
| Ridge_FEATURE1_Val | 2108.951082 | 9.600463e+06 | 3098.461357 |
| Ridge_FEATURE2_Train | 2084.101832 | 1.031979e+07 | 3212.443623 |
| Ridge_FEATURE2_Val | 2102.691899 | 9.118768e+06 | 3019.729731 |
| Ridge_FEATURE3_Train | 2278.747159 | 1.223434e+07 | 3497.762131 |
| Ridge_FEATURE3_Val | 2124.893956 | 9.739202e+06 | 3120.769408 |
from sklearn.linear_model import ElasticNet
def elasticnet(X_train, y_train, X_test, y_test, index_train, index_test):
elastic_reg = ElasticNet()
elastic_reg.fit(X_train, y_train)
y_preds_train = elastic_reg.predict(X_train)
y_preds_test = elastic_reg.predict(X_test)
mse = mean_squared_error(y_train, y_preds_train)
mae = mean_absolute_error(y_train, y_preds_train)
rmse = mean_squared_error(y_train, y_preds_train, squared=False)
elastic_train = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_train])
mse = mean_squared_error(y_test, y_preds_test)
mae = mean_absolute_error(y_test, y_preds_test)
rmse = mean_squared_error(y_test, y_preds_test, squared=False)
elastic_test = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_test])
elastic_models = pd.concat([elastic_train, elastic_test])
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 6))
axes[0].plot(y_train, y_preds_train, 'o',
color='orange', label='Predictions')
axes[0].plot(y_train, y_train, '-', color='red', label='Actual')
axes[0].set_xlabel('Actual')
axes[0].set_ylabel('Predicted')
axes[0].set_title(
f'{index_train}: Comparison of Actual vs. Predicted Target')
axes[0].legend()
axes[1].plot(y_test, y_preds_test, 'o',
color='orange', label='Predictions')
axes[1].plot(y_test, y_test, '-', color='red', label='Actual')
axes[1].set_xlabel('Actual')
axes[1].set_ylabel('Predicted')
axes[1].set_title(
f'{index_test}: Comparison of Actual vs. Predicted Target')
axes[1].legend()
return elastic_models
elastic = elasticnet(X_train, y_train, X_val, y_val, 'Elastic_Train', 'Elastic_Val')
elastic
/opt/homebrew/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:631: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 1.184e+09, tolerance: 6.038e+06 model = cd_fast.enet_coordinate_descent(
| mae | mse | rmse | |
|---|---|---|---|
| Elastic_Train | 2284.880425 | 1.238910e+07 | 3519.815887 |
| Elastic_Val | 2073.707562 | 8.579011e+06 | 2928.994854 |
elastic_feature1 = elasticnet(X_train_feature1, y_train, X_val_feature1, y_val, 'Elastic_FEATURE1_Train', 'Elastic_FEATURE1_Val')
elastic_feature1
| mae | mse | rmse | |
|---|---|---|---|
| Elastic_FEATURE1_Train | 2469.341627 | 1.406453e+07 | 3750.270443 |
| Elastic_FEATURE1_Val | 2185.184604 | 9.342266e+06 | 3056.512135 |
elastic_feature2 = elasticnet(X_train_feature2, y_train, X_val_feature2, y_val, 'Elastic_FEATURE2_Train', 'Elastic_FEATURE2_Val')
elastic_feature2
| mae | mse | rmse | |
|---|---|---|---|
| Elastic_FEATURE2_Train | 2321.397873 | 1.289433e+07 | 3590.867219 |
| Elastic_FEATURE2_Val | 2109.088513 | 8.697749e+06 | 2949.194714 |
elastic_feature3 = elasticnet(X_train_feature3, y_train, X_val_feature3, y_val, 'Elastic_FEATURE3_Train', 'Elastic_FEATURE3_Val')
elastic_feature3
/opt/homebrew/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:631: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.341e+09, tolerance: 6.038e+06 model = cd_fast.enet_coordinate_descent(
| mae | mse | rmse | |
|---|---|---|---|
| Elastic_FEATURE3_Train | 2279.672369 | 1.223548e+07 | 3497.925509 |
| Elastic_FEATURE3_Val | 2126.979548 | 9.753365e+06 | 3123.037799 |
elastic_model = pd.concat([elastic, elastic_feature1, elastic_feature2, elastic_feature3])
elastic_model
| mae | mse | rmse | |
|---|---|---|---|
| Elastic_Train | 2284.880425 | 1.238910e+07 | 3519.815887 |
| Elastic_Val | 2073.707562 | 8.579011e+06 | 2928.994854 |
| Elastic_FEATURE1_Train | 2469.341627 | 1.406453e+07 | 3750.270443 |
| Elastic_FEATURE1_Val | 2185.184604 | 9.342266e+06 | 3056.512135 |
| Elastic_FEATURE2_Train | 2321.397873 | 1.289433e+07 | 3590.867219 |
| Elastic_FEATURE2_Val | 2109.088513 | 8.697749e+06 | 2949.194714 |
| Elastic_FEATURE3_Train | 2279.672369 | 1.223548e+07 | 3497.925509 |
| Elastic_FEATURE3_Val | 2126.979548 | 9.753365e+06 | 3123.037799 |
from sklearn.tree import DecisionTreeRegressor, plot_tree
def decision_tree_regression(X_train, y_train, X_test, y_test, index_train, index_test):
regressor = DecisionTreeRegressor(random_state=42).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_test = regressor.predict(X_test)
mse = mean_squared_error(y_train, y_preds_train)
mae = mean_absolute_error(y_train, y_preds_train)
rmse = mean_squared_error(y_train, y_preds_train, squared=False)
dt_train = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_train])
mse = mean_squared_error(y_test, y_preds_test)
mae = mean_absolute_error(y_test, y_preds_test)
rmse = mean_squared_error(y_test, y_preds_test, squared=False)
dt_test = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_test])
dt_models = pd.concat([dt_train, dt_test])
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(24, 6))
axes[0].set_title(f'Decision Tree: {index_train}')
plot_tree(regressor, ax=axes[0], filled=True)
axes[1].set_title(f'Decision Tree: {index_test}')
plot_tree(regressor, ax=axes[1], filled=True)
plt.tight_layout()
plt.show()
feature_importances = regressor.feature_importances_
sorted_indices = np.argsort(feature_importances)[::-1]
feature_names = X_train.columns.values
sorted_feature_importances = feature_importances[sorted_indices]
sorted_feature_names = feature_names[sorted_indices]
plt.figure(figsize=(6, 20))
plt.barh(range(len(sorted_feature_importances)), sorted_feature_importances[::-1], align='center')
plt.yticks(range(len(sorted_feature_importances)), sorted_feature_names[::-1])
plt.xlabel("Feature Importance")
plt.ylabel("Features")
plt.title("Decision Tree Regressor - Feature Importance")
plt.show()
return dt_models
dt = decision_tree_regression(X_train, y_train, X_val, y_val, 'Dtregressor_Train', 'Dtregressor_Val')
dt
| mae | mse | rmse | |
|---|---|---|---|
| Dtregressor_Train | 0.000000 | 0.000000e+00 | 0.000000 |
| Dtregressor_Val | 3022.997923 | 1.981871e+07 | 4451.821535 |
dt_feature1 = decision_tree_regression(X_train_feature1, y_train, X_val_feature1, y_val, 'Dtregressor_FEATURE1_Train', 'Dtregressor_FEATURE1_Val')
dt_feature1
| mae | mse | rmse | |
|---|---|---|---|
| Dtregressor_FEATURE1_Train | 0.000000 | 0.000000e+00 | 0.000000 |
| Dtregressor_FEATURE1_Val | 3142.906175 | 2.105998e+07 | 4589.114999 |
dt_feature2 = decision_tree_regression(X_train_feature2, y_train, X_val_feature2, y_val, 'Dtregressor_FEATURE2_Train', 'Dtregressor_FEATURE2_Val')
dt_feature2
| mae | mse | rmse | |
|---|---|---|---|
| Dtregressor_FEATURE2_Train | 0.00000 | 0.000000e+00 | 0.000000 |
| Dtregressor_FEATURE2_Val | 3379.59929 | 2.756409e+07 | 5250.151896 |
dt_feature3 = decision_tree_regression(X_train_feature3, y_train, X_val_feature3, y_val, 'Dtregressor_FEATURE3_Train', 'Dtregressor_FEATURE3_Val')
dt_feature3
| mae | mse | rmse | |
|---|---|---|---|
| Dtregressor_FEATURE3_Train | 0.000000 | 0.000000e+00 | 0.0000 |
| Dtregressor_FEATURE3_Val | 3090.758525 | 2.071248e+07 | 4551.0961 |
max_depth = [2, 5, 10, 20, 50, 100, 150, 200, None]
train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []
for k in max_depth:
regressor = DecisionTreeRegressor(random_state=42, max_depth=k).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_val = regressor.predict(X_val)
train_mae.append(mean_absolute_error(y_train, y_preds_train))
val_mae.append(mean_absolute_error(y_val, y_preds_val))
train_mse.append(mean_squared_error(y_train, y_preds_train))
val_mse.append(mean_squared_error(y_val, y_preds_val))
train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))
result_max_depth = pd.DataFrame({'train_mae': train_mae,
'test_mae': val_mae,
'train_mse': train_mse,
'test_mse': val_mse,
'train_rmse': train_rmse,
'test_rmse': val_rmse}, index=max_depth)
result_max_depth
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| 2.0 | 3815.480925 | 3614.774442 | 2.645389e+07 | 2.241880e+07 | 5143.334947 | 4734.849732 |
| 5.0 | 2017.879079 | 2632.876525 | 8.616914e+06 | 1.638042e+07 | 2935.458046 | 4047.273239 |
| 10.0 | 432.658774 | 3125.795647 | 6.059501e+05 | 2.398058e+07 | 778.427956 | 4896.996807 |
| 20.0 | 5.152083 | 3292.034456 | 1.172666e+03 | 2.500349e+07 | 34.244207 | 5000.348965 |
| 50.0 | 0.000000 | 3022.997923 | 0.000000e+00 | 1.981871e+07 | 0.000000 | 4451.821535 |
| 100.0 | 0.000000 | 3022.997923 | 0.000000e+00 | 1.981871e+07 | 0.000000 | 4451.821535 |
| 150.0 | 0.000000 | 3022.997923 | 0.000000e+00 | 1.981871e+07 | 0.000000 | 4451.821535 |
| 200.0 | 0.000000 | 3022.997923 | 0.000000e+00 | 1.981871e+07 | 0.000000 | 4451.821535 |
| NaN | 0.000000 | 3022.997923 | 0.000000e+00 | 1.981871e+07 | 0.000000 | 4451.821535 |
def plot_performance(parameter, xlabel):
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(16, 5))
ax1.plot(parameter, train_mae, label='Mean Absolute Errors: Train')
ax1.plot(parameter, val_mae, label='Mean Absolute Errors: Validation')
ax1.legend()
ax1.set_xlabel(xlabel)
ax1.set_ylabel('MAE')
ax1.set_title('Mean Absolute Error')
ax2.plot(parameter, train_mse, label='Mean Squared Errors: Train')
ax2.plot(parameter, val_mse, label='Mean Squared Errors: Validation')
ax2.legend()
ax2.set_xlabel(xlabel)
ax2.set_ylabel('MSE')
ax2.set_title('Mean Squared Error')
ax3.plot(parameter, train_rmse, label='Root Mean Squared Errors: Train')
ax3.plot(parameter, val_rmse, label='Root Mean Squared Errors: Validation')
ax3.legend()
ax3.set_xlabel(xlabel)
ax3.set_ylabel('RMSE')
ax3.set_title('Root Mean Squared Errors')
plt.subplots_adjust(wspace=0.4)
plt.show()
plot_performance(max_depth, 'Number of Max Depth')
min_sample_split = [2, 5, 10, 20, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500]
train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []
for k in min_sample_split:
regressor = DecisionTreeRegressor(random_state=42, max_depth=5, min_samples_split=k).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_val = regressor.predict(X_val)
train_mae.append(mean_absolute_error(y_train, y_preds_train))
val_mae.append(mean_absolute_error(y_val, y_preds_val))
train_mse.append(mean_squared_error(y_train, y_preds_train))
val_mse.append(mean_squared_error(y_val, y_preds_val))
train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))
result_min_split = pd.DataFrame({'train_mae': train_mae,
'test_mae': val_mae,
'train_mse': train_mse,
'test_mse': val_mse,
'train_rmse': train_rmse,
'test_rmse': val_rmse}, index=min_sample_split)
result_min_split
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| 2 | 2017.879079 | 2632.876525 | 8.616914e+06 | 1.638042e+07 | 2935.458046 | 4047.273239 |
| 5 | 2039.619128 | 2668.817737 | 8.705258e+06 | 1.668670e+07 | 2950.467421 | 4084.935487 |
| 10 | 2113.086587 | 2616.476637 | 9.294261e+06 | 1.696806e+07 | 3048.649072 | 4119.230275 |
| 20 | 2145.584653 | 2647.062769 | 9.529820e+06 | 1.720362e+07 | 3087.040677 | 4147.724514 |
| 50 | 2405.394644 | 2453.609358 | 1.205364e+07 | 1.397932e+07 | 3471.835964 | 3738.892500 |
| 100 | 3005.717701 | 2827.152308 | 1.960527e+07 | 1.591273e+07 | 4427.783672 | 3989.076370 |
| 150 | 3263.783929 | 3107.642604 | 2.216672e+07 | 1.741710e+07 | 4708.154557 | 4173.379489 |
| 200 | 4062.315694 | 3818.220233 | 3.372255e+07 | 3.003290e+07 | 5807.111693 | 5480.227973 |
| 250 | 4303.294377 | 4110.614686 | 3.558026e+07 | 3.377878e+07 | 5964.918713 | 5811.951725 |
| 300 | 4303.294377 | 4110.614686 | 3.558026e+07 | 3.377878e+07 | 5964.918713 | 5811.951725 |
| 350 | 4303.294377 | 4110.614686 | 3.558026e+07 | 3.377878e+07 | 5964.918713 | 5811.951725 |
| 400 | 5428.056042 | 5216.171946 | 4.828906e+07 | 4.676248e+07 | 6949.033069 | 6838.310091 |
| 450 | 5428.056042 | 5216.171946 | 4.828906e+07 | 4.676248e+07 | 6949.033069 | 6838.310091 |
| 500 | 5428.056042 | 5216.171946 | 4.828906e+07 | 4.676248e+07 | 6949.033069 | 6838.310091 |
plot_performance(min_sample_split, 'Number of min_samples_split')
min_samples_split = 50 appears to be the best choice
min_samples_leaf = [2, 5, 10, 20, 50, 100, 150, 200]
train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []
for k in min_samples_leaf:
regressor = DecisionTreeRegressor(random_state=42, max_depth=5, min_samples_split=50, min_samples_leaf=k).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_val = regressor.predict(X_val)
train_mae.append(mean_absolute_error(y_train, y_preds_train))
val_mae.append(mean_absolute_error(y_val, y_preds_val))
train_mse.append(mean_squared_error(y_train, y_preds_train))
val_mse.append(mean_squared_error(y_val, y_preds_val))
train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))
result_min_leaf = pd.DataFrame({'train_mae': train_mae,
'test_mae': val_mae,
'train_mse': train_mse,
'test_mse': val_mse,
'train_rmse': train_rmse,
'test_rmse': val_rmse}, index=min_samples_leaf)
result_min_leaf
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| 2 | 2434.628170 | 2495.077908 | 1.234958e+07 | 1.434912e+07 | 3514.197533 | 3788.023115 |
| 5 | 2400.117670 | 2497.152084 | 1.252384e+07 | 1.462245e+07 | 3538.903512 | 3823.930808 |
| 10 | 2529.828746 | 2693.498737 | 1.408535e+07 | 1.772781e+07 | 3753.045990 | 4210.440371 |
| 20 | 2542.741510 | 2787.395009 | 1.481632e+07 | 1.640833e+07 | 3849.197790 | 4050.719254 |
| 50 | 3023.045614 | 2977.038460 | 1.974248e+07 | 1.712419e+07 | 4443.251073 | 4138.138037 |
| 100 | 4096.456639 | 3858.048355 | 3.413148e+07 | 3.090315e+07 | 5842.215653 | 5559.060013 |
| 150 | 4303.294377 | 4110.614686 | 3.558026e+07 | 3.377878e+07 | 5964.918713 | 5811.951725 |
| 200 | 5393.094725 | 4675.050604 | 4.920576e+07 | 3.827844e+07 | 7014.681598 | 6186.957172 |
plot_performance(min_samples_leaf, 'Number of min_samples_leaf')
max_features = list(range(1, 40))
train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []
for k in max_features:
regressor = DecisionTreeRegressor(random_state=42, max_depth=5, min_samples_split=50, min_samples_leaf=2, max_features=k).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_val = regressor.predict(X_val)
train_mae.append(mean_absolute_error(y_train, y_preds_train))
val_mae.append(mean_absolute_error(y_val, y_preds_val))
train_mse.append(mean_squared_error(y_train, y_preds_train))
val_mse.append(mean_squared_error(y_val, y_preds_val))
train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))
result_max_features = pd.DataFrame({'train_mae': train_mae,
'test_mae': val_mae,
'train_mse': train_mse,
'test_mse': val_mse,
'train_rmse': train_rmse,
'test_rmse': val_rmse}, index=max_features)
result_max_features
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| 1 | 4264.657260 | 4616.025908 | 3.212019e+07 | 3.590294e+07 | 5667.467929 | 5991.906389 |
| 2 | 3881.229772 | 4051.035539 | 2.541137e+07 | 3.018227e+07 | 5040.968720 | 5493.839657 |
| 3 | 3426.687014 | 3721.874845 | 2.364337e+07 | 2.306265e+07 | 4862.445246 | 4802.358837 |
| 4 | 3097.223558 | 3580.844608 | 1.866593e+07 | 2.397298e+07 | 4320.408298 | 4896.220703 |
| 5 | 3202.852552 | 3238.518236 | 2.015210e+07 | 2.089132e+07 | 4489.109075 | 4570.702049 |
| 6 | 3106.749747 | 3474.954972 | 1.910198e+07 | 2.435943e+07 | 4370.580762 | 4935.527600 |
| 7 | 2728.941066 | 2887.399486 | 1.673508e+07 | 1.758785e+07 | 4090.853564 | 4193.786896 |
| 8 | 2666.172170 | 3081.998005 | 1.515382e+07 | 1.999276e+07 | 3892.790373 | 4471.326484 |
| 9 | 2664.665506 | 3068.606964 | 1.523465e+07 | 1.875990e+07 | 3903.159341 | 4331.269661 |
| 10 | 2415.864738 | 2343.437093 | 1.281677e+07 | 1.175229e+07 | 3580.051743 | 3428.161970 |
| 11 | 2654.738472 | 2897.205908 | 1.495893e+07 | 1.681862e+07 | 3867.677397 | 4101.051515 |
| 12 | 2675.450560 | 3075.612277 | 1.679244e+07 | 1.990135e+07 | 4097.858027 | 4461.092723 |
| 13 | 2484.843948 | 2889.408818 | 1.352989e+07 | 1.920853e+07 | 3678.300071 | 4382.753376 |
| 14 | 2595.096972 | 3068.558237 | 1.430120e+07 | 1.838980e+07 | 3781.693126 | 4288.333164 |
| 15 | 2718.399858 | 3016.266256 | 1.549448e+07 | 1.918380e+07 | 3936.303111 | 4379.931101 |
| 16 | 2582.769468 | 2934.521535 | 1.614190e+07 | 1.652437e+07 | 4017.697807 | 4065.017308 |
| 17 | 2530.412028 | 2864.433139 | 1.385684e+07 | 1.625640e+07 | 3722.477498 | 4031.922083 |
| 18 | 2549.521050 | 2871.871144 | 1.324068e+07 | 1.791743e+07 | 3638.775045 | 4232.898402 |
| 19 | 2685.614858 | 2787.644288 | 1.566936e+07 | 1.919058e+07 | 3958.454428 | 4380.705239 |
| 20 | 2690.517661 | 2755.131712 | 1.593639e+07 | 1.592006e+07 | 3992.040610 | 3989.994524 |
| 21 | 2713.216617 | 2790.841853 | 1.680501e+07 | 1.661129e+07 | 4099.390921 | 4075.695516 |
| 22 | 2519.717968 | 2625.160921 | 1.336965e+07 | 1.485764e+07 | 3656.452782 | 3854.561401 |
| 23 | 2632.505031 | 2772.088734 | 1.505643e+07 | 1.744229e+07 | 3880.261976 | 4176.397070 |
| 24 | 2664.922298 | 3066.453710 | 1.623056e+07 | 1.739398e+07 | 4028.716796 | 4170.609643 |
| 25 | 2598.182466 | 2894.304121 | 1.510566e+07 | 1.545932e+07 | 3886.599621 | 3931.834754 |
| 26 | 2486.253606 | 2802.219441 | 1.275504e+07 | 1.889001e+07 | 3571.420374 | 4346.264177 |
| 27 | 2476.739796 | 2941.197604 | 1.267193e+07 | 2.079016e+07 | 3559.765707 | 4559.622518 |
| 28 | 2747.288799 | 2951.437696 | 1.719211e+07 | 1.743517e+07 | 4146.337260 | 4175.543852 |
| 29 | 2639.839210 | 2920.397189 | 1.610752e+07 | 1.798974e+07 | 4013.417492 | 4241.431158 |
| 30 | 2635.788221 | 2859.073720 | 1.488086e+07 | 1.725833e+07 | 3857.572241 | 4154.314315 |
| 31 | 2768.473489 | 2814.408148 | 1.718205e+07 | 1.580990e+07 | 4145.123584 | 3976.166184 |
| 32 | 2429.230821 | 2776.043621 | 1.260990e+07 | 1.723332e+07 | 3551.041823 | 4151.303390 |
| 33 | 2464.360038 | 2912.959982 | 1.308792e+07 | 2.071590e+07 | 3617.723629 | 4551.472378 |
| 34 | 2608.391222 | 3237.114353 | 1.397910e+07 | 2.308207e+07 | 3738.863757 | 4804.380732 |
| 35 | 2510.333978 | 2790.691469 | 1.327830e+07 | 1.607366e+07 | 3643.940532 | 4009.197502 |
| 36 | 2489.379869 | 2396.316350 | 1.250927e+07 | 1.174200e+07 | 3536.844009 | 3426.660796 |
| 37 | 2556.300489 | 2602.215159 | 1.308516e+07 | 1.471822e+07 | 3617.341451 | 3836.433054 |
| 38 | 2429.550738 | 2956.865052 | 1.267424e+07 | 2.228661e+07 | 3560.089871 | 4720.870151 |
| 39 | 2479.682919 | 2686.939230 | 1.285882e+07 | 1.576122e+07 | 3585.919783 | 3970.039915 |
plot_performance(max_features, 'Number of max_features')
criterion = ['absolute_error', 'squared_error', 'friedman_mse', 'poisson']
train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []
for k in criterion:
regressor = DecisionTreeRegressor(random_state=42, max_depth=5, min_samples_split=50, min_samples_leaf=5, max_features=10, criterion=k).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_val = regressor.predict(X_val)
train_mae.append(mean_absolute_error(y_train, y_preds_train))
val_mae.append(mean_absolute_error(y_val, y_preds_val))
train_mse.append(mean_squared_error(y_train, y_preds_train))
val_mse.append(mean_squared_error(y_val, y_preds_val))
train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))
result_criterion = pd.DataFrame({'train_mae': train_mae,
'test_mae': val_mae,
'train_mse': train_mse,
'test_mse': val_mse,
'train_rmse': train_rmse,
'test_rmse': val_rmse}, index=criterion)
result_criterion
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| absolute_error | 2538.360879 | 2528.292131 | 1.766422e+07 | 1.374484e+07 | 4202.881995 | 3707.403884 |
| squared_error | 2428.632887 | 2377.058551 | 1.302080e+07 | 1.166857e+07 | 3608.435026 | 3415.929063 |
| friedman_mse | 2428.632887 | 2377.058551 | 1.302080e+07 | 1.166857e+07 | 3608.435026 | 3415.929063 |
| poisson | 2696.912025 | 2604.426164 | 1.551771e+07 | 1.462207e+07 | 3939.252978 | 3823.881632 |
plot_performance(criterion, 'Criterion function')
Decision Tree Regressor (After tuning)
import altair as alt
def decision_tree_regression_tuning(X_train, y_train, X_test, y_test, index_train, index_test):
regressor = DecisionTreeRegressor(random_state=42, max_depth=5, min_samples_split=50, min_samples_leaf=5, max_features=10, criterion='squared_error').fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_test = regressor.predict(X_test)
mse = mean_squared_error(y_train, y_preds_train)
mae = mean_absolute_error(y_train, y_preds_train)
rmse = mean_squared_error(y_train, y_preds_train, squared=False)
dt_train = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_train])
mse = mean_squared_error(y_test, y_preds_test)
mae = mean_absolute_error(y_test, y_preds_test)
rmse = mean_squared_error(y_test, y_preds_test, squared=False)
dt_test = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_test])
dt_models = pd.concat([dt_train, dt_test])
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(24, 6))
axes[0].set_title(f'Decision Tree: {index_train}')
plot_tree(regressor, ax=axes[0], filled=True)
axes[1].set_title(f'Decision Tree: {index_test}')
plot_tree(regressor, ax=axes[1], filled=True)
plt.tight_layout()
plt.show()
feature_importances = regressor.feature_importances_
sorted_indices = np.argsort(feature_importances)[::-1]
feature_names = X_train.columns.values
sorted_feature_importances = feature_importances[sorted_indices]
sorted_feature_names = feature_names[sorted_indices]
plt.figure(figsize=(6, 20))
plt.barh(range(len(sorted_feature_importances)), sorted_feature_importances[::-1], align='center')
plt.yticks(range(len(sorted_feature_importances)), sorted_feature_names[::-1])
plt.xlabel("Feature Importance")
plt.ylabel("Features")
plt.title("Decision Tree Regressor - Feature Importance")
plt.show()
return dt_models
dt_tuning = decision_tree_regression_tuning(X_train, y_train, X_val, y_val, 'DT_Tune_Train', 'DT_Tune_Val')
dt_tuning
| mae | mse | rmse | |
|---|---|---|---|
| DT_Tune_Train | 2428.632887 | 1.302080e+07 | 3608.435026 |
| DT_Tune_Val | 2377.058551 | 1.166857e+07 | 3415.929063 |
dt_feature1_tuning = decision_tree_regression_tuning(X_train_feature1, y_train, X_val_feature1, y_val, 'DT_Tune_FEATURE1_Train', 'DT_Tune_FEATURE1_Val')
dt_feature1_tuning
| mae | mse | rmse | |
|---|---|---|---|
| DT_Tune_FEATURE1_Train | 2634.547381 | 1.511255e+07 | 3887.486135 |
| DT_Tune_FEATURE1_Val | 2843.669581 | 1.846317e+07 | 4296.879298 |
dt_feature2_tuning = decision_tree_regression_tuning(X_train_feature2, y_train, X_val_feature2, y_val, 'DT_Tune_FEATURE2_Train', 'DT_Tune_FEATURE2_Val')
dt_feature2_tuning
| mae | mse | rmse | |
|---|---|---|---|
| DT_Tune_FEATURE2_Train | 2640.609970 | 1.566922e+07 | 3958.435982 |
| DT_Tune_FEATURE2_Val | 2532.673206 | 1.454901e+07 | 3814.316998 |
dt_feature3_tuning = decision_tree_regression_tuning(X_train_feature3, y_train, X_val_feature3, y_val, 'DT_Tune_FEATURE3_Train', 'DT_Tune_FEATURE3_Val')
dt_feature3_tuning
| mae | mse | rmse | |
|---|---|---|---|
| DT_Tune_FEATURE3_Train | 2634.547381 | 1.511255e+07 | 3887.486135 |
| DT_Tune_FEATURE3_Val | 2855.303550 | 1.849346e+07 | 4300.402255 |
dt_model = pd.concat([dt, dt_feature1, dt_feature2, dt_feature3, dt_tuning, dt_feature1_tuning, dt_feature2_tuning, dt_feature3_tuning])
dt_model
| mae | mse | rmse | |
|---|---|---|---|
| Dtregressor_Train | 0.000000 | 0.000000e+00 | 0.000000 |
| Dtregressor_Val | 3022.997923 | 1.981871e+07 | 4451.821535 |
| Dtregressor_FEATURE1_Train | 0.000000 | 0.000000e+00 | 0.000000 |
| Dtregressor_FEATURE1_Val | 3142.906175 | 2.105998e+07 | 4589.114999 |
| Dtregressor_FEATURE2_Train | 0.000000 | 0.000000e+00 | 0.000000 |
| Dtregressor_FEATURE2_Val | 3379.599290 | 2.756409e+07 | 5250.151896 |
| Dtregressor_FEATURE3_Train | 0.000000 | 0.000000e+00 | 0.000000 |
| Dtregressor_FEATURE3_Val | 3090.758525 | 2.071248e+07 | 4551.096100 |
| DT_Tune_Train | 2428.632887 | 1.302080e+07 | 3608.435026 |
| DT_Tune_Val | 2377.058551 | 1.166857e+07 | 3415.929063 |
| DT_Tune_FEATURE1_Train | 2634.547381 | 1.511255e+07 | 3887.486135 |
| DT_Tune_FEATURE1_Val | 2843.669581 | 1.846317e+07 | 4296.879298 |
| DT_Tune_FEATURE2_Train | 2640.609970 | 1.566922e+07 | 3958.435982 |
| DT_Tune_FEATURE2_Val | 2532.673206 | 1.454901e+07 | 3814.316998 |
| DT_Tune_FEATURE3_Train | 2634.547381 | 1.511255e+07 | 3887.486135 |
| DT_Tune_FEATURE3_Val | 2855.303550 | 1.849346e+07 | 4300.402255 |
Based on feature importance:
Top 5 Features
from sklearn.ensemble import RandomForestRegressor
def random_forest_regressor(X_train, y_train, X_test, y_test, index_train, index_test):
regressor = RandomForestRegressor(random_state=42).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_test = regressor.predict(X_test)
mse = mean_squared_error(y_train, y_preds_train)
mae = mean_absolute_error(y_train, y_preds_train)
rmse = mean_squared_error(y_train, y_preds_train, squared=False)
rf_train = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_train])
mse = mean_squared_error(y_test, y_preds_test)
mae = mean_absolute_error(y_test, y_preds_test)
rmse = mean_squared_error(y_test, y_preds_test, squared=False)
rf_test = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_test])
dt_models = pd.concat([rf_train, rf_test])
feature_importances = regressor.feature_importances_
sorted_indices = np.argsort(feature_importances)[::-1]
feature_names = X_train.columns.values
sorted_feature_importances = feature_importances[sorted_indices]
sorted_feature_names = feature_names[sorted_indices]
plt.figure(figsize=(6, 20))
plt.barh(range(len(sorted_feature_importances)), sorted_feature_importances[::-1], align='center')
plt.yticks(range(len(sorted_feature_importances)), sorted_feature_names[::-1])
plt.xlabel("Feature Importance")
plt.ylabel("Features")
plt.title("Random Forest Regressor - Feature Importance")
plt.show()
return dt_models
rf = random_forest_regressor(X_train, y_train, X_val, y_val, 'RFregressor_Train', 'RFregressor_Val')
rf
| mae | mse | rmse | |
|---|---|---|---|
| RFregressor_Train | 952.337505 | 2.045942e+06 | 1430.364188 |
| RFregressor_Val | 2203.343703 | 9.785882e+06 | 3128.239434 |
rf_feature1 = random_forest_regressor(X_train_feature1, y_train, X_val_feature1, y_val, 'RFregressor_FEATURE1_Train', 'RFregressor_FEATURE1_Val')
rf_feature1
| mae | mse | rmse | |
|---|---|---|---|
| RFregressor_FEATURE1_Train | 969.471965 | 2.116189e+06 | 1454.712785 |
| RFregressor_FEATURE1_Val | 2187.242154 | 9.871982e+06 | 3141.970979 |
rf_feature2 = random_forest_regressor(X_train_feature2, y_train, X_val_feature2, y_val, 'RFregressor_FEATURE2_Train', 'RFregressor_FEATURE2_Val')
rf_feature2
| mae | mse | rmse | |
|---|---|---|---|
| RFregressor_FEATURE2_Train | 954.917457 | 2.064596e+06 | 1436.870097 |
| RFregressor_FEATURE2_Val | 2206.446608 | 9.923375e+06 | 3150.138845 |
rf_feature3= random_forest_regressor(X_train_feature3, y_train, X_val_feature3, y_val, 'RFregressor_FEATURE3_Train', 'RFregressor_FEATURE3_Val')
rf_feature3
| mae | mse | rmse | |
|---|---|---|---|
| RFregressor_FEATURE3_Train | 969.307015 | 2.117008e+06 | 1454.994201 |
| RFregressor_FEATURE3_Val | 2190.343906 | 9.880833e+06 | 3143.379168 |
n_estimators = [2, 5, 10, 20, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500]
train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []
for k in n_estimators:
regressor = RandomForestRegressor(random_state=42, n_estimators=k).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_val = regressor.predict(X_val)
train_mae.append(mean_absolute_error(y_train, y_preds_train))
val_mae.append(mean_absolute_error(y_val, y_preds_val))
train_mse.append(mean_squared_error(y_train, y_preds_train))
val_mse.append(mean_squared_error(y_val, y_preds_val))
train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))
result_n_estimators = pd.DataFrame({'train_mae': train_mae,
'test_mae': val_mae,
'train_mse': train_mse,
'test_mse': val_mse,
'train_rmse': train_rmse,
'test_rmse': val_rmse}, index=n_estimators)
result_n_estimators
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| 2 | 1242.610916 | 2987.065738 | 6.745545e+06 | 1.691260e+07 | 2597.218708 | 4112.493373 |
| 5 | 1126.597799 | 2532.092481 | 3.257686e+06 | 1.255253e+07 | 1804.906078 | 3542.955581 |
| 10 | 1011.604971 | 2413.340169 | 2.431891e+06 | 1.166108e+07 | 1559.452098 | 3414.832983 |
| 20 | 992.655842 | 2264.938874 | 2.275209e+06 | 1.027619e+07 | 1508.379544 | 3205.649030 |
| 50 | 982.490668 | 2226.859317 | 2.143529e+06 | 9.939897e+06 | 1464.079453 | 3152.760198 |
| 100 | 952.337505 | 2203.343703 | 2.045942e+06 | 9.785882e+06 | 1430.364188 | 3128.239434 |
| 150 | 949.291605 | 2213.789471 | 2.042899e+06 | 9.824783e+06 | 1429.300119 | 3134.451066 |
| 200 | 944.937466 | 2229.819886 | 1.999551e+06 | 9.881873e+06 | 1414.054829 | 3143.544727 |
| 250 | 951.241607 | 2211.912820 | 2.018628e+06 | 9.801909e+06 | 1420.784127 | 3130.800055 |
| 300 | 954.003582 | 2232.031193 | 2.038388e+06 | 9.906177e+06 | 1427.721104 | 3147.407939 |
| 350 | 958.054202 | 2234.438987 | 2.072641e+06 | 9.914890e+06 | 1439.666916 | 3148.791886 |
| 400 | 957.339041 | 2230.991642 | 2.069329e+06 | 9.874852e+06 | 1438.516095 | 3142.427782 |
| 450 | 954.585142 | 2231.985573 | 2.064817e+06 | 9.862552e+06 | 1436.947271 | 3140.470014 |
| 500 | 953.531756 | 2228.501206 | 2.059465e+06 | 9.859875e+06 | 1435.083565 | 3140.043856 |
plot_performance(n_estimators, 'Number of n_estimators')
The best n_estimators seems to be 100
max_depth = [2, 5, 10, 20, 50, 100, 150, 200, None]
train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []
for k in max_depth:
regressor = RandomForestRegressor(random_state=42, n_estimators=100, max_depth=k).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_val = regressor.predict(X_val)
train_mae.append(mean_absolute_error(y_train, y_preds_train))
val_mae.append(mean_absolute_error(y_val, y_preds_val))
train_mse.append(mean_squared_error(y_train, y_preds_train))
val_mse.append(mean_squared_error(y_val, y_preds_val))
train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))
result_max_depth = pd.DataFrame({'train_mae': train_mae,
'test_mae': val_mae,
'train_mse': train_mse,
'test_mse': val_mse,
'train_rmse': train_rmse,
'test_rmse': val_rmse}, index=max_depth)
result_max_depth
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| 2.0 | 2853.453438 | 2754.565884 | 1.660048e+07 | 1.400695e+07 | 4074.368843 | 3742.585952 |
| 5.0 | 1690.756481 | 2245.768725 | 5.765749e+06 | 9.835034e+06 | 2401.197430 | 3136.085835 |
| 10.0 | 1005.398492 | 2173.776943 | 2.175859e+06 | 9.752327e+06 | 1475.079361 | 3122.871615 |
| 20.0 | 952.083485 | 2201.015083 | 2.039902e+06 | 9.757870e+06 | 1428.251283 | 3123.758967 |
| 50.0 | 952.337505 | 2203.343703 | 2.045942e+06 | 9.785882e+06 | 1430.364188 | 3128.239434 |
| 100.0 | 952.337505 | 2203.343703 | 2.045942e+06 | 9.785882e+06 | 1430.364188 | 3128.239434 |
| 150.0 | 952.337505 | 2203.343703 | 2.045942e+06 | 9.785882e+06 | 1430.364188 | 3128.239434 |
| 200.0 | 952.337505 | 2203.343703 | 2.045942e+06 | 9.785882e+06 | 1430.364188 | 3128.239434 |
| NaN | 952.337505 | 2203.343703 | 2.045942e+06 | 9.785882e+06 | 1430.364188 | 3128.239434 |
plot_performance(max_depth, 'Number of max_depth')
The best max_depth is 10
min_sample_split = [2, 5, 10, 20, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500]
train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []
for k in min_sample_split:
regressor = RandomForestRegressor(random_state=42, n_estimators=100, max_depth=10, min_samples_split=k).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_val = regressor.predict(X_val)
train_mae.append(mean_absolute_error(y_train, y_preds_train))
val_mae.append(mean_absolute_error(y_val, y_preds_val))
train_mse.append(mean_squared_error(y_train, y_preds_train))
val_mse.append(mean_squared_error(y_val, y_preds_val))
train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))
result_min_split = pd.DataFrame({'train_mae': train_mae,
'test_mae': val_mae,
'train_mse': train_mse,
'test_mse': val_mse,
'train_rmse': train_rmse,
'test_rmse': val_rmse}, index=min_sample_split)
result_min_split
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| 2 | 1005.398492 | 2173.776943 | 2.175859e+06 | 9.752327e+06 | 1475.079361 | 3122.871615 |
| 5 | 1082.368610 | 2210.049998 | 2.674528e+06 | 9.757091e+06 | 1635.398376 | 3123.634214 |
| 10 | 1259.059300 | 2199.634626 | 3.666149e+06 | 9.799558e+06 | 1914.719060 | 3130.424535 |
| 20 | 1551.093075 | 2218.976044 | 5.563476e+06 | 1.016624e+07 | 2358.702243 | 3188.454215 |
| 50 | 2127.672054 | 2385.840523 | 1.062035e+07 | 1.120086e+07 | 3258.888244 | 3346.769003 |
| 100 | 2764.945928 | 2620.879736 | 1.731910e+07 | 1.446973e+07 | 4161.621958 | 3803.910447 |
| 150 | 3403.335781 | 3160.560872 | 2.565301e+07 | 2.200975e+07 | 5064.880152 | 4691.454519 |
| 200 | 3421.191917 | 3190.272747 | 2.573637e+07 | 2.232729e+07 | 5073.102927 | 4725.176182 |
| 250 | 3682.532220 | 3524.576999 | 2.841669e+07 | 2.477537e+07 | 5330.730588 | 4977.486548 |
| 300 | 4345.721384 | 4182.219041 | 3.478046e+07 | 3.153400e+07 | 5897.495915 | 5615.514061 |
| 350 | 7083.538618 | 7509.107319 | 8.436794e+07 | 8.566889e+07 | 9185.202492 | 9255.749141 |
| 400 | 8309.393546 | 8723.341999 | 1.105863e+08 | 1.118168e+08 | 10516.003668 | 10574.347796 |
| 450 | 8309.393546 | 8723.341999 | 1.105863e+08 | 1.118168e+08 | 10516.003668 | 10574.347796 |
| 500 | 8309.393546 | 8723.341999 | 1.105863e+08 | 1.118168e+08 | 10516.003668 | 10574.347796 |
plot_performance(min_sample_split, 'Number of min_samples_split')
The best min_samples_split = 50
min_samples_leaf = [2, 5, 10, 20, 50, 100, 150, 200]
train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []
for k in min_samples_leaf:
regressor = RandomForestRegressor(random_state=42, n_estimators=100, max_depth=10, min_samples_split=50, min_samples_leaf=k).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_val = regressor.predict(X_val)
train_mae.append(mean_absolute_error(y_train, y_preds_train))
val_mae.append(mean_absolute_error(y_val, y_preds_val))
train_mse.append(mean_squared_error(y_train, y_preds_train))
val_mse.append(mean_squared_error(y_val, y_preds_val))
train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))
result_min_leaf = pd.DataFrame({'train_mae': train_mae,
'test_mae': val_mae,
'train_mse': train_mse,
'test_mse': val_mse,
'train_rmse': train_rmse,
'test_rmse': val_rmse}, index=min_samples_leaf)
result_min_leaf
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| 2 | 2141.827379 | 2404.326973 | 1.089825e+07 | 1.124311e+07 | 3301.249606 | 3353.075266 |
| 5 | 2163.526677 | 2390.473206 | 1.124620e+07 | 1.119163e+07 | 3353.536053 | 3345.389088 |
| 10 | 2204.406785 | 2410.624136 | 1.170962e+07 | 1.135133e+07 | 3421.932374 | 3369.173041 |
| 20 | 2287.403270 | 2442.456604 | 1.288107e+07 | 1.176767e+07 | 3589.020227 | 3430.404233 |
| 50 | 2831.516772 | 2692.189520 | 1.881492e+07 | 1.592891e+07 | 4337.617113 | 3991.104111 |
| 100 | 3654.767197 | 3382.997994 | 2.830094e+07 | 2.419499e+07 | 5319.863147 | 4918.840419 |
| 150 | 4771.574255 | 4476.755346 | 4.276172e+07 | 3.719275e+07 | 6539.244253 | 6098.585612 |
| 200 | 8309.393546 | 8723.341999 | 1.105863e+08 | 1.118168e+08 | 10516.003668 | 10574.347796 |
plot_performance(min_samples_leaf, 'Number of min_samples_leaf')
The best min_samples_leaf = 5
max_features = list(range(1, 100))
train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []
for k in max_features:
regressor = RandomForestRegressor(random_state=42, n_estimators=100, max_depth=10, min_samples_split=50, min_samples_leaf=5, max_features=k).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_val = regressor.predict(X_val)
train_mae.append(mean_absolute_error(y_train, y_preds_train))
val_mae.append(mean_absolute_error(y_val, y_preds_val))
train_mse.append(mean_squared_error(y_train, y_preds_train))
val_mse.append(mean_squared_error(y_val, y_preds_val))
train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))
result_max_features = pd.DataFrame({'train_mae': train_mae,
'test_mae': val_mae,
'train_mse': train_mse,
'test_mse': val_mse,
'train_rmse': train_rmse,
'test_rmse': val_rmse}, index=max_features)
result_max_features
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| 1 | 3477.675432 | 3591.978705 | 2.279570e+07 | 2.285192e+07 | 4774.484725 | 4780.368513 |
| 2 | 2996.785590 | 3130.866184 | 1.822634e+07 | 1.775541e+07 | 4269.231262 | 4213.717345 |
| 3 | 2697.977342 | 2804.672885 | 1.565209e+07 | 1.409895e+07 | 3956.271881 | 3754.857489 |
| 4 | 2563.007738 | 2634.722391 | 1.469556e+07 | 1.278529e+07 | 3833.478616 | 3575.652727 |
| 5 | 2455.740146 | 2482.778473 | 1.371981e+07 | 1.141160e+07 | 3704.025706 | 3378.106174 |
| 6 | 2395.197031 | 2415.635187 | 1.363482e+07 | 1.134146e+07 | 3692.535814 | 3367.708762 |
| 7 | 2365.982644 | 2405.957880 | 1.286377e+07 | 1.087528e+07 | 3586.610014 | 3297.768640 |
| 8 | 2357.595785 | 2358.718720 | 1.292670e+07 | 1.091105e+07 | 3595.372040 | 3303.187279 |
| 9 | 2339.322455 | 2363.001359 | 1.303042e+07 | 1.093239e+07 | 3609.767030 | 3306.416751 |
| 10 | 2288.021607 | 2313.586664 | 1.237356e+07 | 1.044550e+07 | 3517.606962 | 3231.950528 |
| 11 | 2303.571198 | 2345.275577 | 1.246210e+07 | 1.075239e+07 | 3530.170657 | 3279.083880 |
| 12 | 2263.198193 | 2324.972135 | 1.207774e+07 | 1.048390e+07 | 3475.304881 | 3237.885858 |
| 13 | 2261.675395 | 2321.685589 | 1.198885e+07 | 1.046523e+07 | 3462.491523 | 3235.000632 |
| 14 | 2284.428986 | 2320.541422 | 1.230656e+07 | 1.049975e+07 | 3508.070968 | 3240.331834 |
| 15 | 2259.455941 | 2324.687848 | 1.224559e+07 | 1.073812e+07 | 3499.369916 | 3276.906946 |
| 16 | 2269.028731 | 2324.564421 | 1.229451e+07 | 1.072337e+07 | 3506.352529 | 3274.655712 |
| 17 | 2269.294456 | 2320.041288 | 1.222199e+07 | 1.086599e+07 | 3495.995571 | 3296.360812 |
| 18 | 2248.662305 | 2321.096855 | 1.203000e+07 | 1.044803e+07 | 3468.428812 | 3232.341157 |
| 19 | 2258.168483 | 2356.655551 | 1.212543e+07 | 1.063474e+07 | 3482.159199 | 3261.095514 |
| 20 | 2233.110780 | 2367.705899 | 1.175932e+07 | 1.072561e+07 | 3429.186113 | 3274.998333 |
| 21 | 2265.113922 | 2304.691750 | 1.215032e+07 | 1.045364e+07 | 3485.731510 | 3233.209403 |
| 22 | 2240.266854 | 2335.620006 | 1.167223e+07 | 1.029991e+07 | 3416.464697 | 3209.348049 |
| 23 | 2224.690100 | 2318.560599 | 1.165417e+07 | 1.041771e+07 | 3413.820121 | 3227.648191 |
| 24 | 2230.372806 | 2337.121687 | 1.177480e+07 | 1.052904e+07 | 3431.442778 | 3244.848900 |
| 25 | 2219.945368 | 2362.964875 | 1.163162e+07 | 1.057774e+07 | 3410.515973 | 3252.343348 |
| 26 | 2240.318404 | 2344.560924 | 1.190931e+07 | 1.064153e+07 | 3450.986971 | 3262.135552 |
| 27 | 2227.629365 | 2353.467212 | 1.163935e+07 | 1.058192e+07 | 3411.648580 | 3252.986321 |
| 28 | 2220.995455 | 2334.881538 | 1.164116e+07 | 1.060648e+07 | 3411.914842 | 3256.758744 |
| 29 | 2219.492127 | 2359.982153 | 1.166252e+07 | 1.060287e+07 | 3415.043270 | 3256.205141 |
| 30 | 2222.899028 | 2355.995655 | 1.160157e+07 | 1.064918e+07 | 3406.107307 | 3263.307880 |
| 31 | 2229.859567 | 2347.962188 | 1.180031e+07 | 1.046774e+07 | 3435.158096 | 3235.388406 |
| 32 | 2206.067967 | 2345.454912 | 1.154127e+07 | 1.044915e+07 | 3397.244430 | 3232.515139 |
| 33 | 2220.604089 | 2363.516672 | 1.174859e+07 | 1.066493e+07 | 3427.622217 | 3265.720755 |
| 34 | 2210.573180 | 2350.170072 | 1.160315e+07 | 1.078372e+07 | 3406.339071 | 3283.857896 |
| 35 | 2201.999336 | 2375.997269 | 1.147708e+07 | 1.075874e+07 | 3387.783453 | 3280.051217 |
| 36 | 2225.451652 | 2329.382373 | 1.152563e+07 | 1.040532e+07 | 3394.942263 | 3225.727923 |
| 37 | 2199.660309 | 2329.480813 | 1.143512e+07 | 1.032473e+07 | 3381.584958 | 3213.212111 |
| 38 | 2211.380415 | 2330.893425 | 1.164872e+07 | 1.047942e+07 | 3413.022021 | 3237.193675 |
| 39 | 2220.664702 | 2321.428615 | 1.150099e+07 | 1.040542e+07 | 3391.311257 | 3225.742935 |
| 40 | 2190.897092 | 2349.662527 | 1.134702e+07 | 1.059901e+07 | 3368.533575 | 3255.611436 |
| 41 | 2208.679685 | 2380.184891 | 1.151432e+07 | 1.065604e+07 | 3393.275092 | 3264.359068 |
| 42 | 2195.172664 | 2363.248364 | 1.140986e+07 | 1.073709e+07 | 3377.848963 | 3276.749864 |
| 43 | 2197.798555 | 2375.819634 | 1.153330e+07 | 1.085194e+07 | 3396.071856 | 3294.228784 |
| 44 | 2206.237803 | 2335.165620 | 1.142035e+07 | 1.031231e+07 | 3379.400114 | 3211.277963 |
| 45 | 2202.357096 | 2369.670373 | 1.137317e+07 | 1.047362e+07 | 3372.412354 | 3236.297323 |
| 46 | 2200.605109 | 2350.466337 | 1.143111e+07 | 1.067409e+07 | 3380.991769 | 3267.122643 |
| 47 | 2202.805817 | 2345.531062 | 1.150952e+07 | 1.052906e+07 | 3392.568808 | 3244.851283 |
| 48 | 2196.774474 | 2343.252537 | 1.122175e+07 | 1.040008e+07 | 3349.888185 | 3224.915151 |
| 49 | 2202.828910 | 2339.605899 | 1.147040e+07 | 1.052662e+07 | 3386.797245 | 3244.475442 |
| 50 | 2195.341509 | 2375.389586 | 1.136672e+07 | 1.057528e+07 | 3371.456005 | 3251.965190 |
| 51 | 2193.798781 | 2357.174145 | 1.164871e+07 | 1.076966e+07 | 3413.020457 | 3281.716336 |
| 52 | 2192.617729 | 2349.609988 | 1.140911e+07 | 1.072870e+07 | 3377.737272 | 3275.468680 |
| 53 | 2201.018179 | 2361.040949 | 1.129995e+07 | 1.055747e+07 | 3361.539568 | 3249.226062 |
| 54 | 2167.359910 | 2358.374427 | 1.130564e+07 | 1.060692e+07 | 3362.385532 | 3256.826985 |
| 55 | 2186.617296 | 2350.329180 | 1.124623e+07 | 1.038978e+07 | 3353.539184 | 3223.317915 |
| 56 | 2175.129498 | 2334.029587 | 1.118126e+07 | 1.057609e+07 | 3343.839064 | 3252.089977 |
| 57 | 2184.533111 | 2379.362957 | 1.138006e+07 | 1.092604e+07 | 3373.435177 | 3305.456699 |
| 58 | 2206.831104 | 2376.727396 | 1.149347e+07 | 1.088292e+07 | 3390.201464 | 3298.927807 |
| 59 | 2167.928291 | 2352.624917 | 1.125031e+07 | 1.083861e+07 | 3354.147797 | 3292.204874 |
| 60 | 2190.262508 | 2340.763859 | 1.134159e+07 | 1.043096e+07 | 3367.728188 | 3229.699255 |
| 61 | 2186.351162 | 2398.651876 | 1.142174e+07 | 1.102970e+07 | 3379.606801 | 3321.099699 |
| 62 | 2189.000472 | 2368.537020 | 1.134646e+07 | 1.096177e+07 | 3368.450007 | 3310.856729 |
| 63 | 2196.230296 | 2392.547030 | 1.146475e+07 | 1.082730e+07 | 3385.963834 | 3290.486357 |
| 64 | 2191.070341 | 2364.543376 | 1.136020e+07 | 1.069429e+07 | 3370.489140 | 3270.212545 |
| 65 | 2189.735705 | 2375.952124 | 1.133595e+07 | 1.066541e+07 | 3366.890040 | 3265.793211 |
| 66 | 2179.877695 | 2404.505752 | 1.133625e+07 | 1.092867e+07 | 3366.934461 | 3305.853970 |
| 67 | 2181.647404 | 2381.641043 | 1.142899e+07 | 1.089971e+07 | 3380.678248 | 3301.470690 |
| 68 | 2200.062800 | 2383.520937 | 1.143584e+07 | 1.089394e+07 | 3381.691252 | 3300.597042 |
| 69 | 2193.185785 | 2372.937359 | 1.127669e+07 | 1.082489e+07 | 3358.077660 | 3290.120568 |
| 70 | 2179.722317 | 2378.917742 | 1.122143e+07 | 1.112525e+07 | 3349.840668 | 3335.453866 |
| 71 | 2185.913439 | 2396.766372 | 1.137832e+07 | 1.104587e+07 | 3373.176330 | 3323.532864 |
| 72 | 2170.520913 | 2384.924205 | 1.133528e+07 | 1.124914e+07 | 3366.790663 | 3353.973971 |
| 73 | 2180.124054 | 2390.532155 | 1.130259e+07 | 1.122649e+07 | 3361.932889 | 3350.594885 |
| 74 | 2187.528284 | 2402.264045 | 1.154054e+07 | 1.130720e+07 | 3397.137680 | 3362.618521 |
| 75 | 2165.658546 | 2389.996353 | 1.113513e+07 | 1.112608e+07 | 3336.933546 | 3335.578032 |
| 76 | 2178.107152 | 2409.681037 | 1.149347e+07 | 1.140646e+07 | 3390.202349 | 3377.344740 |
| 77 | 2168.237897 | 2401.050385 | 1.122791e+07 | 1.124049e+07 | 3350.807655 | 3352.684650 |
| 78 | 2181.473126 | 2403.966264 | 1.123609e+07 | 1.118304e+07 | 3352.027612 | 3344.105868 |
| 79 | 2182.974023 | 2391.513480 | 1.141083e+07 | 1.122284e+07 | 3377.991582 | 3350.050479 |
| 80 | 2175.044720 | 2391.072793 | 1.135441e+07 | 1.106569e+07 | 3369.629886 | 3326.513206 |
| 81 | 2189.078315 | 2420.074593 | 1.153326e+07 | 1.147696e+07 | 3396.064673 | 3387.765747 |
| 82 | 2171.294464 | 2397.885922 | 1.120208e+07 | 1.127856e+07 | 3346.951439 | 3358.356700 |
| 83 | 2179.697016 | 2378.928047 | 1.132295e+07 | 1.108011e+07 | 3364.959350 | 3328.680407 |
| 84 | 2151.080439 | 2384.881926 | 1.116961e+07 | 1.122427e+07 | 3342.095926 | 3350.264479 |
| 85 | 2179.423692 | 2396.384202 | 1.132146e+07 | 1.116307e+07 | 3364.737009 | 3341.118718 |
| 86 | 2177.231377 | 2395.471641 | 1.127447e+07 | 1.111767e+07 | 3357.748201 | 3334.316927 |
| 87 | 2181.048281 | 2379.886606 | 1.124974e+07 | 1.094590e+07 | 3354.063443 | 3308.458634 |
| 88 | 2184.524816 | 2367.938702 | 1.133174e+07 | 1.095595e+07 | 3366.264618 | 3309.976676 |
| 89 | 2177.634308 | 2348.983069 | 1.129787e+07 | 1.070280e+07 | 3361.231017 | 3271.513334 |
| 90 | 2181.626819 | 2373.402474 | 1.130044e+07 | 1.102033e+07 | 3361.613222 | 3319.688136 |
| 91 | 2178.033375 | 2385.980360 | 1.124835e+07 | 1.109387e+07 | 3353.856274 | 3330.746192 |
| 92 | 2171.328467 | 2386.442445 | 1.123837e+07 | 1.100789e+07 | 3352.367142 | 3317.814101 |
| 93 | 2176.888403 | 2391.573655 | 1.137194e+07 | 1.120430e+07 | 3372.230536 | 3347.282741 |
| 94 | 2174.701624 | 2385.790997 | 1.129514e+07 | 1.105274e+07 | 3360.824846 | 3324.566042 |
| 95 | 2163.637111 | 2368.454600 | 1.120318e+07 | 1.100253e+07 | 3347.114556 | 3317.006576 |
| 96 | 2175.740832 | 2375.865642 | 1.127626e+07 | 1.102723e+07 | 3358.014426 | 3320.727813 |
| 97 | 2160.552088 | 2386.074411 | 1.124433e+07 | 1.111819e+07 | 3353.256709 | 3334.395394 |
| 98 | 2167.347134 | 2374.431595 | 1.133766e+07 | 1.115557e+07 | 3367.143626 | 3339.995541 |
| 99 | 2165.591054 | 2384.505743 | 1.126598e+07 | 1.116858e+07 | 3356.483474 | 3341.941808 |
result_max_features
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| 1 | 3477.675432 | 3591.978705 | 2.279570e+07 | 2.285192e+07 | 4774.484725 | 4780.368513 |
| 2 | 2996.785590 | 3130.866184 | 1.822634e+07 | 1.775541e+07 | 4269.231262 | 4213.717345 |
| 3 | 2697.977342 | 2804.672885 | 1.565209e+07 | 1.409895e+07 | 3956.271881 | 3754.857489 |
| 4 | 2563.007738 | 2634.722391 | 1.469556e+07 | 1.278529e+07 | 3833.478616 | 3575.652727 |
| 5 | 2455.740146 | 2482.778473 | 1.371981e+07 | 1.141160e+07 | 3704.025706 | 3378.106174 |
| 6 | 2395.197031 | 2415.635187 | 1.363482e+07 | 1.134146e+07 | 3692.535814 | 3367.708762 |
| 7 | 2365.982644 | 2405.957880 | 1.286377e+07 | 1.087528e+07 | 3586.610014 | 3297.768640 |
| 8 | 2357.595785 | 2358.718720 | 1.292670e+07 | 1.091105e+07 | 3595.372040 | 3303.187279 |
| 9 | 2339.322455 | 2363.001359 | 1.303042e+07 | 1.093239e+07 | 3609.767030 | 3306.416751 |
| 10 | 2288.021607 | 2313.586664 | 1.237356e+07 | 1.044550e+07 | 3517.606962 | 3231.950528 |
| 11 | 2303.571198 | 2345.275577 | 1.246210e+07 | 1.075239e+07 | 3530.170657 | 3279.083880 |
| 12 | 2263.198193 | 2324.972135 | 1.207774e+07 | 1.048390e+07 | 3475.304881 | 3237.885858 |
| 13 | 2261.675395 | 2321.685589 | 1.198885e+07 | 1.046523e+07 | 3462.491523 | 3235.000632 |
| 14 | 2284.428986 | 2320.541422 | 1.230656e+07 | 1.049975e+07 | 3508.070968 | 3240.331834 |
| 15 | 2259.455941 | 2324.687848 | 1.224559e+07 | 1.073812e+07 | 3499.369916 | 3276.906946 |
| 16 | 2269.028731 | 2324.564421 | 1.229451e+07 | 1.072337e+07 | 3506.352529 | 3274.655712 |
| 17 | 2269.294456 | 2320.041288 | 1.222199e+07 | 1.086599e+07 | 3495.995571 | 3296.360812 |
| 18 | 2248.662305 | 2321.096855 | 1.203000e+07 | 1.044803e+07 | 3468.428812 | 3232.341157 |
| 19 | 2258.168483 | 2356.655551 | 1.212543e+07 | 1.063474e+07 | 3482.159199 | 3261.095514 |
| 20 | 2233.110780 | 2367.705899 | 1.175932e+07 | 1.072561e+07 | 3429.186113 | 3274.998333 |
| 21 | 2265.113922 | 2304.691750 | 1.215032e+07 | 1.045364e+07 | 3485.731510 | 3233.209403 |
| 22 | 2240.266854 | 2335.620006 | 1.167223e+07 | 1.029991e+07 | 3416.464697 | 3209.348049 |
| 23 | 2224.690100 | 2318.560599 | 1.165417e+07 | 1.041771e+07 | 3413.820121 | 3227.648191 |
| 24 | 2230.372806 | 2337.121687 | 1.177480e+07 | 1.052904e+07 | 3431.442778 | 3244.848900 |
| 25 | 2219.945368 | 2362.964875 | 1.163162e+07 | 1.057774e+07 | 3410.515973 | 3252.343348 |
| 26 | 2240.318404 | 2344.560924 | 1.190931e+07 | 1.064153e+07 | 3450.986971 | 3262.135552 |
| 27 | 2227.629365 | 2353.467212 | 1.163935e+07 | 1.058192e+07 | 3411.648580 | 3252.986321 |
| 28 | 2220.995455 | 2334.881538 | 1.164116e+07 | 1.060648e+07 | 3411.914842 | 3256.758744 |
| 29 | 2219.492127 | 2359.982153 | 1.166252e+07 | 1.060287e+07 | 3415.043270 | 3256.205141 |
| 30 | 2222.899028 | 2355.995655 | 1.160157e+07 | 1.064918e+07 | 3406.107307 | 3263.307880 |
| 31 | 2229.859567 | 2347.962188 | 1.180031e+07 | 1.046774e+07 | 3435.158096 | 3235.388406 |
| 32 | 2206.067967 | 2345.454912 | 1.154127e+07 | 1.044915e+07 | 3397.244430 | 3232.515139 |
| 33 | 2220.604089 | 2363.516672 | 1.174859e+07 | 1.066493e+07 | 3427.622217 | 3265.720755 |
| 34 | 2210.573180 | 2350.170072 | 1.160315e+07 | 1.078372e+07 | 3406.339071 | 3283.857896 |
| 35 | 2201.999336 | 2375.997269 | 1.147708e+07 | 1.075874e+07 | 3387.783453 | 3280.051217 |
| 36 | 2225.451652 | 2329.382373 | 1.152563e+07 | 1.040532e+07 | 3394.942263 | 3225.727923 |
| 37 | 2199.660309 | 2329.480813 | 1.143512e+07 | 1.032473e+07 | 3381.584958 | 3213.212111 |
| 38 | 2211.380415 | 2330.893425 | 1.164872e+07 | 1.047942e+07 | 3413.022021 | 3237.193675 |
| 39 | 2220.664702 | 2321.428615 | 1.150099e+07 | 1.040542e+07 | 3391.311257 | 3225.742935 |
| 40 | 2190.897092 | 2349.662527 | 1.134702e+07 | 1.059901e+07 | 3368.533575 | 3255.611436 |
| 41 | 2208.679685 | 2380.184891 | 1.151432e+07 | 1.065604e+07 | 3393.275092 | 3264.359068 |
| 42 | 2195.172664 | 2363.248364 | 1.140986e+07 | 1.073709e+07 | 3377.848963 | 3276.749864 |
| 43 | 2197.798555 | 2375.819634 | 1.153330e+07 | 1.085194e+07 | 3396.071856 | 3294.228784 |
| 44 | 2206.237803 | 2335.165620 | 1.142035e+07 | 1.031231e+07 | 3379.400114 | 3211.277963 |
| 45 | 2202.357096 | 2369.670373 | 1.137317e+07 | 1.047362e+07 | 3372.412354 | 3236.297323 |
| 46 | 2200.605109 | 2350.466337 | 1.143111e+07 | 1.067409e+07 | 3380.991769 | 3267.122643 |
| 47 | 2202.805817 | 2345.531062 | 1.150952e+07 | 1.052906e+07 | 3392.568808 | 3244.851283 |
| 48 | 2196.774474 | 2343.252537 | 1.122175e+07 | 1.040008e+07 | 3349.888185 | 3224.915151 |
| 49 | 2202.828910 | 2339.605899 | 1.147040e+07 | 1.052662e+07 | 3386.797245 | 3244.475442 |
| 50 | 2195.341509 | 2375.389586 | 1.136672e+07 | 1.057528e+07 | 3371.456005 | 3251.965190 |
| 51 | 2193.798781 | 2357.174145 | 1.164871e+07 | 1.076966e+07 | 3413.020457 | 3281.716336 |
| 52 | 2192.617729 | 2349.609988 | 1.140911e+07 | 1.072870e+07 | 3377.737272 | 3275.468680 |
| 53 | 2201.018179 | 2361.040949 | 1.129995e+07 | 1.055747e+07 | 3361.539568 | 3249.226062 |
| 54 | 2167.359910 | 2358.374427 | 1.130564e+07 | 1.060692e+07 | 3362.385532 | 3256.826985 |
| 55 | 2186.617296 | 2350.329180 | 1.124623e+07 | 1.038978e+07 | 3353.539184 | 3223.317915 |
| 56 | 2175.129498 | 2334.029587 | 1.118126e+07 | 1.057609e+07 | 3343.839064 | 3252.089977 |
| 57 | 2184.533111 | 2379.362957 | 1.138006e+07 | 1.092604e+07 | 3373.435177 | 3305.456699 |
| 58 | 2206.831104 | 2376.727396 | 1.149347e+07 | 1.088292e+07 | 3390.201464 | 3298.927807 |
| 59 | 2167.928291 | 2352.624917 | 1.125031e+07 | 1.083861e+07 | 3354.147797 | 3292.204874 |
| 60 | 2190.262508 | 2340.763859 | 1.134159e+07 | 1.043096e+07 | 3367.728188 | 3229.699255 |
| 61 | 2186.351162 | 2398.651876 | 1.142174e+07 | 1.102970e+07 | 3379.606801 | 3321.099699 |
| 62 | 2189.000472 | 2368.537020 | 1.134646e+07 | 1.096177e+07 | 3368.450007 | 3310.856729 |
| 63 | 2196.230296 | 2392.547030 | 1.146475e+07 | 1.082730e+07 | 3385.963834 | 3290.486357 |
| 64 | 2191.070341 | 2364.543376 | 1.136020e+07 | 1.069429e+07 | 3370.489140 | 3270.212545 |
| 65 | 2189.735705 | 2375.952124 | 1.133595e+07 | 1.066541e+07 | 3366.890040 | 3265.793211 |
| 66 | 2179.877695 | 2404.505752 | 1.133625e+07 | 1.092867e+07 | 3366.934461 | 3305.853970 |
| 67 | 2181.647404 | 2381.641043 | 1.142899e+07 | 1.089971e+07 | 3380.678248 | 3301.470690 |
| 68 | 2200.062800 | 2383.520937 | 1.143584e+07 | 1.089394e+07 | 3381.691252 | 3300.597042 |
| 69 | 2193.185785 | 2372.937359 | 1.127669e+07 | 1.082489e+07 | 3358.077660 | 3290.120568 |
| 70 | 2179.722317 | 2378.917742 | 1.122143e+07 | 1.112525e+07 | 3349.840668 | 3335.453866 |
| 71 | 2185.913439 | 2396.766372 | 1.137832e+07 | 1.104587e+07 | 3373.176330 | 3323.532864 |
| 72 | 2170.520913 | 2384.924205 | 1.133528e+07 | 1.124914e+07 | 3366.790663 | 3353.973971 |
| 73 | 2180.124054 | 2390.532155 | 1.130259e+07 | 1.122649e+07 | 3361.932889 | 3350.594885 |
| 74 | 2187.528284 | 2402.264045 | 1.154054e+07 | 1.130720e+07 | 3397.137680 | 3362.618521 |
| 75 | 2165.658546 | 2389.996353 | 1.113513e+07 | 1.112608e+07 | 3336.933546 | 3335.578032 |
| 76 | 2178.107152 | 2409.681037 | 1.149347e+07 | 1.140646e+07 | 3390.202349 | 3377.344740 |
| 77 | 2168.237897 | 2401.050385 | 1.122791e+07 | 1.124049e+07 | 3350.807655 | 3352.684650 |
| 78 | 2181.473126 | 2403.966264 | 1.123609e+07 | 1.118304e+07 | 3352.027612 | 3344.105868 |
| 79 | 2182.974023 | 2391.513480 | 1.141083e+07 | 1.122284e+07 | 3377.991582 | 3350.050479 |
| 80 | 2175.044720 | 2391.072793 | 1.135441e+07 | 1.106569e+07 | 3369.629886 | 3326.513206 |
| 81 | 2189.078315 | 2420.074593 | 1.153326e+07 | 1.147696e+07 | 3396.064673 | 3387.765747 |
| 82 | 2171.294464 | 2397.885922 | 1.120208e+07 | 1.127856e+07 | 3346.951439 | 3358.356700 |
| 83 | 2179.697016 | 2378.928047 | 1.132295e+07 | 1.108011e+07 | 3364.959350 | 3328.680407 |
| 84 | 2151.080439 | 2384.881926 | 1.116961e+07 | 1.122427e+07 | 3342.095926 | 3350.264479 |
| 85 | 2179.423692 | 2396.384202 | 1.132146e+07 | 1.116307e+07 | 3364.737009 | 3341.118718 |
| 86 | 2177.231377 | 2395.471641 | 1.127447e+07 | 1.111767e+07 | 3357.748201 | 3334.316927 |
| 87 | 2181.048281 | 2379.886606 | 1.124974e+07 | 1.094590e+07 | 3354.063443 | 3308.458634 |
| 88 | 2184.524816 | 2367.938702 | 1.133174e+07 | 1.095595e+07 | 3366.264618 | 3309.976676 |
| 89 | 2177.634308 | 2348.983069 | 1.129787e+07 | 1.070280e+07 | 3361.231017 | 3271.513334 |
| 90 | 2181.626819 | 2373.402474 | 1.130044e+07 | 1.102033e+07 | 3361.613222 | 3319.688136 |
| 91 | 2178.033375 | 2385.980360 | 1.124835e+07 | 1.109387e+07 | 3353.856274 | 3330.746192 |
| 92 | 2171.328467 | 2386.442445 | 1.123837e+07 | 1.100789e+07 | 3352.367142 | 3317.814101 |
| 93 | 2176.888403 | 2391.573655 | 1.137194e+07 | 1.120430e+07 | 3372.230536 | 3347.282741 |
| 94 | 2174.701624 | 2385.790997 | 1.129514e+07 | 1.105274e+07 | 3360.824846 | 3324.566042 |
| 95 | 2163.637111 | 2368.454600 | 1.120318e+07 | 1.100253e+07 | 3347.114556 | 3317.006576 |
| 96 | 2175.740832 | 2375.865642 | 1.127626e+07 | 1.102723e+07 | 3358.014426 | 3320.727813 |
| 97 | 2160.552088 | 2386.074411 | 1.124433e+07 | 1.111819e+07 | 3353.256709 | 3334.395394 |
| 98 | 2167.347134 | 2374.431595 | 1.133766e+07 | 1.115557e+07 | 3367.143626 | 3339.995541 |
| 99 | 2165.591054 | 2384.505743 | 1.126598e+07 | 1.116858e+07 | 3356.483474 | 3341.941808 |
plot_performance(max_features, 'Number of max_features')
The best max_features = 95
def random_forest_regressor_tuning(X_train, y_train, X_test, y_test, index_train, index_test):
regressor = RandomForestRegressor(random_state=42, n_estimators=100, max_depth=10, min_samples_split=50, min_samples_leaf=5, max_features=95).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_test = regressor.predict(X_test)
mse = mean_squared_error(y_train, y_preds_train)
mae = mean_absolute_error(y_train, y_preds_train)
rmse = mean_squared_error(y_train, y_preds_train, squared=False)
rf_train = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_train])
mse = mean_squared_error(y_test, y_preds_test)
mae = mean_absolute_error(y_test, y_preds_test)
rmse = mean_squared_error(y_test, y_preds_test, squared=False)
rf_test = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_test])
dt_models = pd.concat([rf_train, rf_test])
feature_importances = regressor.feature_importances_
sorted_indices = np.argsort(feature_importances)[::-1]
feature_names = X_train.columns.values
sorted_feature_importances = feature_importances[sorted_indices]
sorted_feature_names = feature_names[sorted_indices]
plt.figure(figsize=(6, 20))
plt.barh(range(len(sorted_feature_importances)), sorted_feature_importances[::-1], align='center')
plt.yticks(range(len(sorted_feature_importances)), sorted_feature_names[::-1])
plt.xlabel("Feature Importance")
plt.ylabel("Features")
plt.title("Random Forest Regressor - Feature Importance")
plt.show()
return dt_models
rf_tuning = random_forest_regressor_tuning(X_train, y_train, X_val, y_val, 'RF_Tune_Train', 'RF_Tune_Val')
rf_tuning
| mae | mse | rmse | |
|---|---|---|---|
| RF_Tune_Train | 2163.637111 | 1.120318e+07 | 3347.114556 |
| RF_Tune_Val | 2368.454600 | 1.100253e+07 | 3317.006576 |
rf_feature1_tuning = random_forest_regressor_tuning(X_train_feature1, y_train, X_val_feature1, y_val, 'RF_Tune_FEATURE1_Train', 'RF_Tune_FEATURE1_Val')
rf_feature1_tuning
| mae | mse | rmse | |
|---|---|---|---|
| RF_Tune_FEATURE1_Train | 2217.155423 | 1.176883e+07 | 3430.572444 |
| RF_Tune_FEATURE1_Val | 2406.069268 | 1.134822e+07 | 3368.712046 |
rf_feature2_tuning = random_forest_regressor_tuning(X_train_feature2, y_train, X_val_feature2, y_val, 'RF_Tune_FEATURE2_Train', 'RF_Tune_FEATURE2_Val')
rf_feature2_tuning
| mae | mse | rmse | |
|---|---|---|---|
| RF_Tune_FEATURE2_Train | 2162.109950 | 1.127638e+07 | 3358.032533 |
| RF_Tune_FEATURE2_Val | 2389.063839 | 1.112963e+07 | 3336.109354 |
rf_feature3_tuning = random_forest_regressor_tuning(X_train_feature3, y_train, X_val_feature3, y_val, 'RF_Tune_FEATURE3_Train', 'RF_Tune_FEATURE3_Val')
rf_feature3_tuning
| mae | mse | rmse | |
|---|---|---|---|
| RF_Tune_FEATURE3_Train | 2216.333066 | 1.176684e+07 | 3430.283430 |
| RF_Tune_FEATURE3_Val | 2403.261234 | 1.134797e+07 | 3368.674639 |
rf_model = pd.concat([rf, rf_feature1, rf_feature2, rf_feature3, rf_tuning, rf_feature1_tuning, rf_feature2_tuning, rf_feature3_tuning])
rf_model
| mae | mse | rmse | |
|---|---|---|---|
| RFregressor_Train | 952.337505 | 2.045942e+06 | 1430.364188 |
| RFregressor_Val | 2203.343703 | 9.785882e+06 | 3128.239434 |
| RFregressor_FEATURE1_Train | 969.471965 | 2.116189e+06 | 1454.712785 |
| RFregressor_FEATURE1_Val | 2187.242154 | 9.871982e+06 | 3141.970979 |
| RFregressor_FEATURE2_Train | 954.917457 | 2.064596e+06 | 1436.870097 |
| RFregressor_FEATURE2_Val | 2206.446608 | 9.923375e+06 | 3150.138845 |
| RFregressor_FEATURE3_Train | 969.307015 | 2.117008e+06 | 1454.994201 |
| RFregressor_FEATURE3_Val | 2190.343906 | 9.880833e+06 | 3143.379168 |
| RF_Tune_Train | 2163.637111 | 1.120318e+07 | 3347.114556 |
| RF_Tune_Val | 2368.454600 | 1.100253e+07 | 3317.006576 |
| RF_Tune_FEATURE1_Train | 2217.155423 | 1.176883e+07 | 3430.572444 |
| RF_Tune_FEATURE1_Val | 2406.069268 | 1.134822e+07 | 3368.712046 |
| RF_Tune_FEATURE2_Train | 2162.109950 | 1.127638e+07 | 3358.032533 |
| RF_Tune_FEATURE2_Val | 2389.063839 | 1.112963e+07 | 3336.109354 |
| RF_Tune_FEATURE3_Train | 2216.333066 | 1.176684e+07 | 3430.283430 |
| RF_Tune_FEATURE3_Val | 2403.261234 | 1.134797e+07 | 3368.674639 |
from sklearn.ensemble import GradientBoostingRegressor
def xgboost(X_train, y_train, X_test, y_test, index_train, index_test):
xgboost_reg = GradientBoostingRegressor().fit(X_train, y_train)
y_preds_train = xgboost_reg.predict(X_train)
y_preds_test = xgboost_reg.predict(X_test)
mse = mean_squared_error(y_preds_train, y_train)
mae = mean_absolute_error(y_preds_train, y_train)
rmse = mean_squared_error(y_preds_train, y_train, squared=False)
xgboost_train = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_train])
mse = mean_squared_error(y_preds_test, y_test)
mae = mean_absolute_error(y_preds_test, y_test)
rmse = mean_squared_error(y_preds_test, y_test, squared=False)
xgboost_test = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_test])
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 6))
# Plot the predicted vs actual target values for the training set
axes[0].plot(y_train, y_preds_train, 'o',
color='orange', label='Predictions')
axes[0].plot(y_train, y_train, '-', color='red', label='Actual')
axes[0].set_xlabel('Actual')
axes[0].set_ylabel('Predicted')
axes[0].set_title(
f'{index_train}: Comparison of Actual vs. Predicted Target')
axes[0].legend()
axes[1].plot(y_test, y_preds_test, 'o',
color='orange', label='Predictions')
axes[1].plot(y_test, y_test, '-', color='red', label='Actual')
axes[1].set_xlabel('Actual')
axes[1].set_ylabel('Predicted')
axes[1].set_title(
f'{index_test}: Comparison of Actual vs. Predicted Target')
axes[1].legend()
plt.show()
xgboost_models = pd.concat([xgboost_train, xgboost_test])
return xgboost_models
xg = xgboost(X_train, y_train, X_val, y_val, 'XGBoost_Train', 'XGBoost_Val')
xg
| mae | mse | rmse | |
|---|---|---|---|
| XGBoost_Train | 1062.784899 | 2.039445e+06 | 1428.091324 |
| XGBoost_Val | 2370.002169 | 1.150793e+07 | 3392.334072 |
xg_feature1 = xgboost(X_train_feature1, y_train, X_val_feature1, y_val, 'XGBoost_FEATURE1_Train', 'XGBoost_FEATURE1_Val')
xg_feature1
| mae | mse | rmse | |
|---|---|---|---|
| XGBoost_FEATURE1_Train | 1251.945221 | 2.949687e+06 | 1717.465394 |
| XGBoost_FEATURE1_Val | 2381.406002 | 1.206589e+07 | 3473.598919 |
xg_feature2 = xgboost(X_train_feature2, y_train, X_val_feature2, y_val, 'XGBoost_FEATURE2_Train', 'XGBoost_FEATURE2_Val')
xg_feature2
| mae | mse | rmse | |
|---|---|---|---|
| XGBoost_FEATURE2_Train | 1114.508146 | 2.326707e+06 | 1525.354768 |
| XGBoost_FEATURE2_Val | 2330.471706 | 1.189117e+07 | 3448.357665 |
xg_feature3 = xgboost(X_train_feature3, y_train, X_val_feature3, y_val, 'XGBoost_FEATURE3_Train', 'XGBoost_FEATURE3_Val')
xg_feature3
| mae | mse | rmse | |
|---|---|---|---|
| XGBoost_FEATURE3_Train | 1251.945221 | 2.949687e+06 | 1717.465394 |
| XGBoost_FEATURE3_Val | 2390.399367 | 1.221464e+07 | 3494.944348 |
learning_rate = [0.001, 0.01, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 1]
train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []
for k in learning_rate:
regressor = GradientBoostingRegressor(random_state=42, learning_rate=k).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_val = regressor.predict(X_val)
train_mae.append(mean_absolute_error(y_train, y_preds_train))
val_mae.append(mean_absolute_error(y_val, y_preds_val))
train_mse.append(mean_squared_error(y_train, y_preds_train))
val_mse.append(mean_squared_error(y_val, y_preds_val))
train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))
result_learning_rate = pd.DataFrame({'train_mae': train_mae,
'test_mae': val_mae,
'train_mse': train_mse,
'test_mse': val_mse,
'train_rmse': train_rmse,
'test_rmse': val_rmse}, index=learning_rate)
result_learning_rate
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| 0.001 | 7637.367424 | 7955.366567 | 9.355801e+07 | 9.258699e+07 | 9672.538934 | 9622.213450 |
| 0.010 | 3905.271359 | 3931.880791 | 2.514555e+07 | 2.310171e+07 | 5014.533708 | 4806.423923 |
| 0.100 | 1062.784899 | 2370.547474 | 2.039445e+06 | 1.118598e+07 | 1428.091324 | 3344.544623 |
| 0.150 | 774.505422 | 2394.947704 | 1.058870e+06 | 1.093955e+07 | 1029.014099 | 3307.499695 |
| 0.200 | 577.790777 | 2377.679801 | 5.425335e+05 | 1.116542e+07 | 736.568745 | 3341.469584 |
| 0.300 | 313.369595 | 2315.334644 | 1.605535e+05 | 1.056805e+07 | 400.691277 | 3250.854066 |
| 0.400 | 207.376570 | 2513.281874 | 7.047907e+04 | 1.281325e+07 | 265.478941 | 3579.559796 |
| 0.500 | 109.283164 | 2778.266649 | 2.058349e+04 | 1.437993e+07 | 143.469464 | 3792.087221 |
| 1.000 | 14.395943 | 3540.087158 | 3.462054e+02 | 2.565066e+07 | 18.606595 | 5064.647788 |
plot_performance(learning_rate, 'Number of learning_rate')
The best learning_rate = 0.3
n_estimators = [2, 5, 10, 20, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500]
train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []
for k in n_estimators:
regressor = GradientBoostingRegressor(random_state=42, learning_rate=0.3, n_estimators=k).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_val = regressor.predict(X_val)
train_mae.append(mean_absolute_error(y_train, y_preds_train))
val_mae.append(mean_absolute_error(y_val, y_preds_val))
train_mse.append(mean_squared_error(y_train, y_preds_train))
val_mse.append(mean_squared_error(y_val, y_preds_val))
train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))
result_n_estimators = pd.DataFrame({'train_mae': train_mae,
'test_mae': val_mae,
'train_mse': train_mse,
'test_mse': val_mse,
'train_rmse': train_rmse,
'test_rmse': val_rmse}, index=n_estimators)
result_n_estimators
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| 2 | 4809.319733 | 4902.595605 | 3.754147e+07 | 3.491928e+07 | 6127.109831 | 5909.253875 |
| 5 | 2722.114794 | 2678.412655 | 1.323165e+07 | 1.176303e+07 | 3637.533961 | 3429.727588 |
| 10 | 1897.562236 | 2159.119324 | 7.202624e+06 | 9.456199e+06 | 2683.770470 | 3075.093266 |
| 20 | 1477.064331 | 2203.857035 | 4.099596e+06 | 9.593891e+06 | 2024.745920 | 3097.400617 |
| 50 | 792.346707 | 2321.985863 | 9.871235e+05 | 1.065937e+07 | 993.540893 | 3264.868751 |
| 100 | 313.369595 | 2315.334644 | 1.605535e+05 | 1.056805e+07 | 400.691277 | 3250.854066 |
| 150 | 144.448363 | 2310.388526 | 3.341922e+04 | 1.052438e+07 | 182.809255 | 3244.130283 |
| 200 | 65.244271 | 2314.904790 | 6.547823e+03 | 1.055840e+07 | 80.918621 | 3249.369306 |
| 250 | 28.543625 | 2317.728432 | 1.360793e+03 | 1.056667e+07 | 36.888930 | 3250.641373 |
| 300 | 13.864842 | 2318.692472 | 3.140040e+02 | 1.057643e+07 | 17.720157 | 3252.141681 |
| 350 | 6.197603 | 2318.552079 | 6.283897e+01 | 1.057725e+07 | 7.927104 | 3252.269032 |
| 400 | 3.028791 | 2318.410223 | 1.456341e+01 | 1.057707e+07 | 3.816203 | 3252.240054 |
| 450 | 1.490880 | 2318.125249 | 3.513121e+00 | 1.057522e+07 | 1.874332 | 3251.955903 |
| 500 | 0.736964 | 2318.156858 | 8.794494e-01 | 1.057561e+07 | 0.937790 | 3252.016988 |
plot_performance(n_estimators, 'Number of n_estimators')
The best n_estimators = 10
max_depth = list(range(1, 20))
train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []
for k in max_depth:
regressor = GradientBoostingRegressor(random_state=42, learning_rate=0.3, n_estimators=10, max_depth=k).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_val = regressor.predict(X_val)
train_mae.append(mean_absolute_error(y_train, y_preds_train))
val_mae.append(mean_absolute_error(y_val, y_preds_val))
train_mse.append(mean_squared_error(y_train, y_preds_train))
val_mse.append(mean_squared_error(y_val, y_preds_val))
train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))
result_max_depth = pd.DataFrame({'train_mae': train_mae,
'test_mae': val_mae,
'train_mse': train_mse,
'test_mse': val_mse,
'train_rmse': train_rmse,
'test_rmse': val_rmse}, index=max_depth)
result_max_depth
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| 1 | 3033.378689 | 3030.005878 | 1.813618e+07 | 1.607499e+07 | 4258.659469 | 4009.362335 |
| 2 | 2245.290935 | 2456.658308 | 1.015635e+07 | 1.192099e+07 | 3186.902536 | 3452.678622 |
| 3 | 1897.562236 | 2159.119324 | 7.202624e+06 | 9.456199e+06 | 2683.770470 | 3075.093266 |
| 4 | 1419.249862 | 2416.253854 | 3.876618e+06 | 1.078335e+07 | 1968.912975 | 3283.801910 |
| 5 | 999.995619 | 2472.558868 | 1.678759e+06 | 1.165968e+07 | 1295.669407 | 3414.628081 |
| 6 | 640.556450 | 2481.935044 | 6.283118e+05 | 1.286504e+07 | 792.661203 | 3586.787037 |
| 7 | 470.013470 | 2493.233621 | 3.410014e+05 | 1.262973e+07 | 583.953221 | 3553.832539 |
| 8 | 335.950780 | 2656.258139 | 1.634064e+05 | 1.408870e+07 | 404.235563 | 3753.492209 |
| 9 | 295.677782 | 2662.952255 | 1.278064e+05 | 1.531712e+07 | 357.500253 | 3913.709262 |
| 10 | 248.782725 | 2598.521394 | 9.675634e+04 | 1.453755e+07 | 311.056815 | 3812.814487 |
| 11 | 245.523077 | 2826.931898 | 9.516235e+04 | 1.689388e+07 | 308.483949 | 4110.216132 |
| 12 | 237.048890 | 2809.766082 | 9.032624e+04 | 1.737428e+07 | 300.543237 | 4168.246205 |
| 13 | 234.917294 | 2787.765145 | 8.917618e+04 | 1.630613e+07 | 298.623812 | 4038.084888 |
| 14 | 234.685055 | 2852.724386 | 8.861161e+04 | 1.695581e+07 | 297.677025 | 4117.743422 |
| 15 | 234.664120 | 3032.458092 | 8.834857e+04 | 1.933811e+07 | 297.234878 | 4397.511641 |
| 16 | 234.664120 | 2972.875571 | 8.829941e+04 | 1.938559e+07 | 297.152159 | 4402.907360 |
| 17 | 234.664120 | 3000.691313 | 8.826067e+04 | 1.981977e+07 | 297.086973 | 4451.939605 |
| 18 | 234.664120 | 2987.046102 | 8.826392e+04 | 1.863146e+07 | 297.092440 | 4316.417172 |
| 19 | 234.664120 | 3026.466533 | 8.825935e+04 | 1.977414e+07 | 297.084759 | 4446.812458 |
plot_performance(max_depth, 'Number of max_depth')
max_depth = 3
min_sample_split = [2, 5, 10, 20, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500]
train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []
for k in min_sample_split:
regressor = GradientBoostingRegressor(random_state=42, learning_rate=0.3, n_estimators=10, max_depth=3, min_samples_split=k).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_val = regressor.predict(X_val)
train_mae.append(mean_absolute_error(y_train, y_preds_train))
val_mae.append(mean_absolute_error(y_val, y_preds_val))
train_mse.append(mean_squared_error(y_train, y_preds_train))
val_mse.append(mean_squared_error(y_val, y_preds_val))
train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))
result_min_split = pd.DataFrame({'train_mae': train_mae,
'test_mae': val_mae,
'train_mse': train_mse,
'test_mse': val_mse,
'train_rmse': train_rmse,
'test_rmse': val_rmse}, index=min_sample_split)
result_min_split
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| 2 | 1897.562236 | 2159.119324 | 7.202624e+06 | 9.456199e+06 | 2683.770470 | 3075.093266 |
| 5 | 1898.441015 | 2159.119324 | 7.276317e+06 | 9.456199e+06 | 2697.464861 | 3075.093266 |
| 10 | 1883.888592 | 2223.960670 | 7.101234e+06 | 1.012880e+07 | 2664.814129 | 3182.577178 |
| 20 | 1889.719239 | 2221.183396 | 7.194737e+06 | 1.012559e+07 | 2682.300669 | 3182.073600 |
| 50 | 1948.157575 | 2225.500680 | 7.593369e+06 | 1.022215e+07 | 2755.606806 | 3197.209912 |
| 100 | 2096.057554 | 2384.481232 | 9.258531e+06 | 1.087457e+07 | 3042.783404 | 3297.660784 |
| 150 | 2146.909054 | 2377.424572 | 9.811386e+06 | 1.117483e+07 | 3132.313169 | 3342.877288 |
| 200 | 2216.788479 | 2464.222510 | 1.054611e+07 | 1.134667e+07 | 3247.477312 | 3368.481882 |
| 250 | 2265.416683 | 2447.021771 | 1.106094e+07 | 1.088560e+07 | 3325.799465 | 3299.333507 |
| 300 | 2314.863814 | 2506.663830 | 1.128385e+07 | 1.183549e+07 | 3359.143898 | 3440.274393 |
| 350 | 2392.093288 | 2610.630323 | 1.176835e+07 | 1.231941e+07 | 3430.502935 | 3509.902016 |
| 400 | 2434.891181 | 2498.018701 | 1.269111e+07 | 1.218749e+07 | 3562.458562 | 3491.059159 |
| 450 | 2603.960333 | 2718.318055 | 1.388002e+07 | 1.292833e+07 | 3725.590335 | 3595.598472 |
| 500 | 2783.687584 | 2816.446267 | 1.549812e+07 | 1.390165e+07 | 3936.765194 | 3728.490992 |
plot_performance(min_sample_split, 'Number of min_samples_split')
The best min_samples_split = 5
min_samples_leaf = [1, 2, 5, 10, 20, 50, 100, 150, 200]
train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []
for k in min_samples_leaf:
regressor = GradientBoostingRegressor(random_state=42, learning_rate=0.3, n_estimators=10, max_depth=3, min_samples_split=5, min_samples_leaf=k).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_val = regressor.predict(X_val)
train_mae.append(mean_absolute_error(y_train, y_preds_train))
val_mae.append(mean_absolute_error(y_val, y_preds_val))
train_mse.append(mean_squared_error(y_train, y_preds_train))
val_mse.append(mean_squared_error(y_val, y_preds_val))
train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))
result_min_leaf = pd.DataFrame({'train_mae': train_mae,
'test_mae': val_mae,
'train_mse': train_mse,
'test_mse': val_mse,
'train_rmse': train_rmse,
'test_rmse': val_rmse}, index=min_samples_leaf)
result_min_leaf
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| 1 | 1898.441015 | 2159.119324 | 7.276317e+06 | 9.456199e+06 | 2697.464861 | 3075.093266 |
| 2 | 1874.827835 | 2189.431910 | 7.268464e+06 | 9.714303e+06 | 2696.008869 | 3116.777656 |
| 5 | 1901.049412 | 2228.578556 | 7.681601e+06 | 1.003300e+07 | 2771.570045 | 3167.491331 |
| 10 | 1892.191028 | 2315.660004 | 7.659879e+06 | 1.107284e+07 | 2767.648591 | 3327.588306 |
| 20 | 2045.176632 | 2403.469490 | 8.953731e+06 | 1.128570e+07 | 2992.278621 | 3359.420099 |
| 50 | 2209.278612 | 2502.178592 | 1.136276e+07 | 1.180240e+07 | 3370.869384 | 3435.462618 |
| 100 | 2555.035352 | 2634.764948 | 1.544620e+07 | 1.431501e+07 | 3930.165889 | 3783.517729 |
| 150 | 3086.782447 | 3017.239913 | 2.107417e+07 | 1.943793e+07 | 4590.661456 | 4408.846623 |
| 200 | 3768.490867 | 3503.482440 | 2.872838e+07 | 2.346540e+07 | 5359.886515 | 4844.109647 |
plot_performance(min_samples_leaf, 'Number of min_samples_leaf')
The best min_sampls_leaf = 1
max_features = list(range(1, 70))
train_mae = []
val_mae = []
train_mse = []
val_mse = []
train_rmse = []
val_rmse = []
for k in max_features:
regressor = GradientBoostingRegressor(random_state=42, learning_rate=0.3, n_estimators=10, max_depth=3, min_samples_split=5, min_samples_leaf=1, max_features=k).fit(X_train, y_train)
y_preds_train = regressor.predict(X_train)
y_preds_val = regressor.predict(X_val)
train_mae.append(mean_absolute_error(y_train, y_preds_train))
val_mae.append(mean_absolute_error(y_val, y_preds_val))
train_mse.append(mean_squared_error(y_train, y_preds_train))
val_mse.append(mean_squared_error(y_val, y_preds_val))
train_rmse.append(mean_squared_error(y_train, y_preds_train, squared=False))
val_rmse.append(mean_squared_error(y_val, y_preds_val, squared=False))
result_max_features = pd.DataFrame({'train_mae': train_mae,
'test_mae': val_mae,
'train_mse': train_mse,
'test_mse': val_mse,
'train_rmse': train_rmse,
'test_rmse': val_rmse}, index=max_features)
result_max_features
| train_mae | test_mae | train_mse | test_mse | train_rmse | test_rmse | |
|---|---|---|---|---|---|---|
| 1 | 2992.389833 | 3547.188460 | 1.554257e+07 | 2.249611e+07 | 3942.406301 | 4743.005916 |
| 2 | 2574.093971 | 3019.681125 | 1.261069e+07 | 1.712248e+07 | 3551.153268 | 4137.931304 |
| 3 | 2275.514408 | 2477.041671 | 1.013244e+07 | 1.175872e+07 | 3183.149577 | 3429.098479 |
| 4 | 2298.232125 | 2802.968884 | 1.082636e+07 | 1.403492e+07 | 3290.343262 | 3746.321089 |
| 5 | 2237.959004 | 2657.553610 | 9.969061e+06 | 1.269945e+07 | 3157.382059 | 3563.629201 |
| 6 | 2106.562001 | 2458.637054 | 9.242476e+06 | 1.124718e+07 | 3040.144036 | 3353.681524 |
| 7 | 2037.578828 | 2583.834179 | 8.646991e+06 | 1.248260e+07 | 2940.576657 | 3533.072451 |
| 8 | 2011.943922 | 2393.731295 | 8.419335e+06 | 1.138662e+07 | 2901.608961 | 3374.407129 |
| 9 | 2038.764307 | 2407.997667 | 8.009854e+06 | 1.124592e+07 | 2830.168597 | 3353.494102 |
| 10 | 1998.737805 | 2206.331138 | 8.201365e+06 | 8.736292e+06 | 2863.802593 | 2955.721868 |
| 11 | 2014.367212 | 2482.064230 | 8.414228e+06 | 1.160916e+07 | 2900.728954 | 3407.221816 |
| 12 | 2085.881434 | 2330.761701 | 8.934988e+06 | 1.049240e+07 | 2989.145060 | 3239.197152 |
| 13 | 2061.457960 | 2271.215899 | 8.482537e+06 | 1.029450e+07 | 2912.479531 | 3208.504163 |
| 14 | 2003.339351 | 2311.735219 | 8.103362e+06 | 1.000137e+07 | 2846.640435 | 3162.494532 |
| 15 | 2013.889376 | 2496.186374 | 8.092926e+06 | 1.252522e+07 | 2844.806935 | 3539.098203 |
| 16 | 2058.052176 | 2203.388316 | 8.495790e+06 | 9.529086e+06 | 2914.753842 | 3086.921740 |
| 17 | 1939.063923 | 2396.947164 | 7.873419e+06 | 1.097980e+07 | 2805.961264 | 3313.577843 |
| 18 | 1982.051029 | 2398.589982 | 8.107846e+06 | 1.111559e+07 | 2847.427919 | 3334.005166 |
| 19 | 1953.171736 | 2450.729489 | 7.992664e+06 | 1.168225e+07 | 2827.129945 | 3417.930505 |
| 20 | 2000.091070 | 2543.037171 | 8.185593e+06 | 1.290967e+07 | 2861.047483 | 3593.002678 |
| 21 | 1961.152666 | 2309.144296 | 7.710684e+06 | 1.045117e+07 | 2776.811847 | 3232.826858 |
| 22 | 1943.273149 | 2366.941002 | 7.427542e+06 | 1.046259e+07 | 2725.351670 | 3234.592261 |
| 23 | 1957.332048 | 2426.631432 | 7.416235e+06 | 1.220879e+07 | 2723.276450 | 3494.107539 |
| 24 | 1912.032472 | 2367.153511 | 7.695077e+06 | 1.054375e+07 | 2774.000167 | 3247.113593 |
| 25 | 2016.562669 | 2421.440937 | 8.309338e+06 | 1.178134e+07 | 2882.592320 | 3432.395584 |
| 26 | 1952.019032 | 2477.070143 | 7.695946e+06 | 1.193328e+07 | 2774.156763 | 3454.458097 |
| 27 | 1958.096399 | 2349.317443 | 8.054867e+06 | 1.069258e+07 | 2838.109828 | 3269.950592 |
| 28 | 1940.208625 | 2528.815554 | 7.629295e+06 | 1.286331e+07 | 2762.117835 | 3586.545442 |
| 29 | 1965.596148 | 2401.075885 | 7.947390e+06 | 1.121367e+07 | 2819.111474 | 3348.681779 |
| 30 | 1969.630044 | 2448.475997 | 7.943748e+06 | 1.276058e+07 | 2818.465461 | 3572.195182 |
| 31 | 2020.749465 | 2416.029432 | 7.970437e+06 | 1.115745e+07 | 2823.196169 | 3340.276358 |
| 32 | 1919.364014 | 2439.772242 | 7.908414e+06 | 1.141210e+07 | 2812.190168 | 3378.180035 |
| 33 | 1909.791418 | 2514.905207 | 7.012936e+06 | 1.184286e+07 | 2648.194799 | 3441.345926 |
| 34 | 1906.001625 | 2389.177372 | 7.666696e+06 | 1.094239e+07 | 2768.879968 | 3307.928604 |
| 35 | 1953.780610 | 2420.255617 | 7.763947e+06 | 1.121581e+07 | 2786.385938 | 3349.001724 |
| 36 | 1986.557154 | 2277.525776 | 8.221187e+06 | 1.024731e+07 | 2867.261300 | 3201.141697 |
| 37 | 1896.047226 | 2399.734026 | 7.151915e+06 | 1.144981e+07 | 2674.306398 | 3383.756173 |
| 38 | 1897.733189 | 2278.857193 | 7.216185e+06 | 9.882588e+06 | 2686.295745 | 3143.658362 |
| 39 | 1919.316434 | 2530.867845 | 7.820896e+06 | 1.268273e+07 | 2796.586510 | 3561.282753 |
| 40 | 1913.376595 | 2346.555895 | 7.383016e+06 | 1.094623e+07 | 2717.170640 | 3308.508458 |
| 41 | 1823.386168 | 2365.026844 | 6.858351e+06 | 1.107708e+07 | 2618.845331 | 3328.224871 |
| 42 | 1907.930966 | 2272.248721 | 7.436039e+06 | 1.048804e+07 | 2726.910086 | 3238.524763 |
| 43 | 1904.423842 | 2515.061461 | 7.231355e+06 | 1.229140e+07 | 2689.117828 | 3505.908649 |
| 44 | 1950.866619 | 2493.243156 | 8.014659e+06 | 1.184041e+07 | 2831.017298 | 3440.989388 |
| 45 | 1923.274910 | 2517.585887 | 7.466743e+06 | 1.258081e+07 | 2732.534195 | 3546.944003 |
| 46 | 1844.362405 | 2434.269232 | 7.093039e+06 | 1.141351e+07 | 2663.275901 | 3378.388956 |
| 47 | 1916.309631 | 2348.536953 | 7.229358e+06 | 1.065626e+07 | 2688.746623 | 3264.392825 |
| 48 | 1931.464973 | 2428.566394 | 7.293784e+06 | 1.140836e+07 | 2700.700707 | 3377.626377 |
| 49 | 1923.513881 | 2387.288767 | 7.916913e+06 | 1.126563e+07 | 2813.700917 | 3356.431746 |
| 50 | 1879.004111 | 2481.924378 | 7.225423e+06 | 1.265682e+07 | 2688.014672 | 3557.642438 |
| 51 | 1896.152620 | 2454.064629 | 7.067025e+06 | 1.235377e+07 | 2658.387601 | 3514.792851 |
| 52 | 1906.914574 | 2488.217491 | 7.508954e+06 | 1.133260e+07 | 2740.246982 | 3366.393418 |
| 53 | 1904.590783 | 2461.580204 | 7.140712e+06 | 1.123760e+07 | 2672.211070 | 3352.252968 |
| 54 | 1908.189896 | 2288.106879 | 7.700862e+06 | 1.032449e+07 | 2775.042634 | 3213.174359 |
| 55 | 1960.302362 | 2336.407477 | 7.897687e+06 | 1.093127e+07 | 2810.282372 | 3306.247020 |
| 56 | 1925.309409 | 2519.647891 | 7.385809e+06 | 1.258542e+07 | 2717.684480 | 3547.592975 |
| 57 | 1859.983744 | 2365.354084 | 7.358989e+06 | 1.156973e+07 | 2712.745745 | 3401.429939 |
| 58 | 1911.345778 | 2376.792081 | 7.364006e+06 | 1.091945e+07 | 2713.670167 | 3304.458750 |
| 59 | 1882.376384 | 2521.514272 | 6.862256e+06 | 1.240455e+07 | 2619.590727 | 3522.009492 |
| 60 | 1848.518274 | 2474.497431 | 7.077519e+06 | 1.251163e+07 | 2660.360727 | 3537.178004 |
| 61 | 1921.708254 | 2441.146370 | 7.471476e+06 | 1.146756e+07 | 2733.400030 | 3386.378698 |
| 62 | 1884.062272 | 2457.097806 | 7.595409e+06 | 1.145473e+07 | 2755.976961 | 3384.484303 |
| 63 | 1882.603689 | 2428.898940 | 7.487282e+06 | 1.300150e+07 | 2736.289841 | 3605.759860 |
| 64 | 1869.619868 | 2325.320570 | 6.930000e+06 | 1.011610e+07 | 2632.489291 | 3180.581164 |
| 65 | 1901.388890 | 2388.071904 | 7.413743e+06 | 1.171192e+07 | 2722.818881 | 3422.267844 |
| 66 | 1887.903106 | 2399.645214 | 7.102355e+06 | 1.202512e+07 | 2665.024340 | 3467.725477 |
| 67 | 1867.639031 | 2389.641466 | 7.056586e+06 | 1.149900e+07 | 2656.423582 | 3391.017114 |
| 68 | 1913.423277 | 2290.361412 | 6.992975e+06 | 9.765529e+06 | 2644.423333 | 3124.984680 |
| 69 | 1858.173315 | 2463.036674 | 6.999651e+06 | 1.212339e+07 | 2645.685356 | 3481.865620 |
plot_performance(max_features, 'Number of max_features')
The best max_features = 10
GradientBoosting Regressor function
def xgboost_tuning(X_train, y_train, X_test, y_test, index_train, index_test):
xgboost_reg = GradientBoostingRegressor(random_state=42, learning_rate=0.3, n_estimators=10, max_depth=3, min_samples_split=5, min_samples_leaf=1, max_features=10).fit(X_train, y_train)
y_preds_train = xgboost_reg.predict(X_train)
y_preds_test = xgboost_reg.predict(X_test)
mse = mean_squared_error(y_preds_train, y_train)
mae = mean_absolute_error(y_preds_train, y_train)
rmse = mean_squared_error(y_preds_train, y_train, squared=False)
xgboost_train = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_train])
mse = mean_squared_error(y_preds_test, y_test)
mae = mean_absolute_error(y_preds_test, y_test)
rmse = mean_squared_error(y_preds_test, y_test, squared=False)
xgboost_test = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=[index_test])
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 6))
axes[0].plot(y_train, y_preds_train, 'o',
color='orange', label='Predictions')
axes[0].plot(y_train, y_train, '-', color='red', label='Actual')
axes[0].set_xlabel('Actual')
axes[0].set_ylabel('Predicted')
axes[0].set_title(
f'{index_train}: Comparison of Actual vs. Predicted Target')
axes[0].legend()
axes[1].plot(y_test, y_preds_test, 'o',
color='orange', label='Predictions')
axes[1].plot(y_test, y_test, '-', color='red', label='Actual')
axes[1].set_xlabel('Actual')
axes[1].set_ylabel('Predicted')
axes[1].set_title(
f'{index_test}: Comparison of Actual vs. Predicted Target')
axes[1].legend()
plt.show()
xgboost_models = pd.concat([xgboost_train, xgboost_test])
return xgboost_models
xg_tuning = xgboost_tuning(X_train, y_train, X_val, y_val, 'XGBoost_Tune_Train', 'XGBosst_Tune_Val')
xg_tuning
| mae | mse | rmse | |
|---|---|---|---|
| XGBoost_Tune_Train | 1998.737805 | 8.201365e+06 | 2863.802593 |
| XGBosst_Tune_Val | 2206.331138 | 8.736292e+06 | 2955.721868 |
xg_feature1_tuning = xgboost_tuning(X_train_feature1, y_train, X_val_feature1, y_val, 'XGBoost_Tune_FEATURE1_Train', 'XGBosst_Tune_FEATURE1_Val')
xg_feature1_tuning
| mae | mse | rmse | |
|---|---|---|---|
| XGBoost_Tune_FEATURE1_Train | 1987.072765 | 8.215540e+06 | 2866.276384 |
| XGBosst_Tune_FEATURE1_Val | 2301.942297 | 1.115216e+07 | 3339.484760 |
xg_feature2_tuning = xgboost_tuning(X_train_feature2, y_train, X_val_feature2, y_val, 'XGBoost_Tune_FEATURE2_Train', 'XGBosst_Tune_FEATURE2_Val')
xg_feature2_tuning
| mae | mse | rmse | |
|---|---|---|---|
| XGBoost_Tune_FEATURE2_Train | 1940.521103 | 7.589272e+06 | 2754.863299 |
| XGBosst_Tune_FEATURE2_Val | 2276.243249 | 1.131108e+07 | 3363.195538 |
xg_feature3_tuning = xgboost_tuning(X_train_feature3, y_train, X_val_feature3, y_val, 'XGBoost_Tune_FEATURE3_Train', 'XGBosst_Tune_FEATURE3_Val')
xg_feature3_tuning
| mae | mse | rmse | |
|---|---|---|---|
| XGBoost_Tune_FEATURE3_Train | 1987.072765 | 8.215540e+06 | 2866.276384 |
| XGBosst_Tune_FEATURE3_Val | 2306.776818 | 1.116378e+07 | 3341.225026 |
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform, randint
param_grid = {
'learning_rate': uniform(0.001, 0.3),
'n_estimators': randint(2, 100),
'max_depth': randint(1, 5),
'min_samples_split': [2, 5, 10, 20, 50, 100],
'min_samples_leaf': [1, 2, 5, 10, 20, 50, 100],
'max_features': randint(1, 10)
}
gb_regressor = GradientBoostingRegressor()
random_search = RandomizedSearchCV(
estimator=gb_regressor,
param_distributions=param_grid,
n_iter=10,
scoring='neg_mean_squared_error',
cv=5,
random_state=42
)
random_search.fit(X_train, y_train)
RandomizedSearchCV(cv=5, estimator=GradientBoostingRegressor(),
param_distributions={'learning_rate': <scipy.stats._distn_infrastructure.rv_continuous_frozen object at 0x386016980>,
'max_depth': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x46ef06230>,
'max_features': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x46ef074f0>,
'min_samples_leaf': [1, 2, 5, 10, 20,
50, 100],
'min_samples_split': [2, 5, 10, 20, 50,
100],
'n_estimators': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x41bed3550>},
random_state=42, scoring='neg_mean_squared_error')In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. RandomizedSearchCV(cv=5, estimator=GradientBoostingRegressor(),
param_distributions={'learning_rate': <scipy.stats._distn_infrastructure.rv_continuous_frozen object at 0x386016980>,
'max_depth': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x46ef06230>,
'max_features': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x46ef074f0>,
'min_samples_leaf': [1, 2, 5, 10, 20,
50, 100],
'min_samples_split': [2, 5, 10, 20, 50,
100],
'n_estimators': <scipy.stats._distn_infrastructure.rv_discrete_frozen object at 0x41bed3550>},
random_state=42, scoring='neg_mean_squared_error')GradientBoostingRegressor()
GradientBoostingRegressor()
best_params = random_search.best_params_
best_estimator = random_search.best_score_
best_params
{'learning_rate': 0.12095829151457664,
'max_depth': 4,
'max_features': 3,
'min_samples_leaf': 10,
'min_samples_split': 20,
'n_estimators': 65}
best_estimator
-15708693.11887255
xg_random = GradientBoostingRegressor(**best_params).fit(X_train, y_train)
y_preds_train = xg_random.predict(X_train)
y_preds_val = xg_random.predict(X_val)
mse = mean_squared_error(y_preds_train, y_train)
mae = mean_absolute_error(y_preds_train, y_train)
rmse = mean_squared_error(y_preds_train, y_train, squared=False)
xgboost_random_train = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=['XGBoost_Random_Training'])
mse = mean_squared_error(y_preds_val, y_val)
mae = mean_absolute_error(y_preds_val, y_val)
rmse = mean_squared_error(y_preds_val, y_val, squared=False)
xgboost_random_val = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=['XGBoost_Random_Validating'])
xgboost_random = pd.concat([xgboost_random_train, xgboost_random_val])
xgboost_random
| mae | mse | rmse | |
|---|---|---|---|
| XGBoost_Random_Training | 1407.143534 | 4.645281e+06 | 2155.291443 |
| XGBoost_Random_Validating | 2326.201974 | 1.088273e+07 | 3298.897608 |
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(16, 6))
axes[0].plot(y_train, y_preds_train, 'o',
color='orange', label='Predictions')
axes[0].plot(y_train, y_train, '-', color='red', label='Actual')
axes[0].set_xlabel('Actual')
axes[0].set_ylabel('Predicted')
axes[0].set_title('XGBoost_Random_Train: Comparison of Actual vs. Predicted Target')
axes[0].legend()
axes[1].plot(y_val, y_preds_val, 'o',
color='orange', label='Predictions')
axes[1].plot(y_val, y_val, '-', color='red', label='Actual')
axes[1].set_xlabel('Actual')
axes[1].set_ylabel('Predicted')
axes[1].set_title('XGBoost_Random_Val: Comparison of Actual vs. Predicted Target')
axes[1].legend()
<matplotlib.legend.Legend at 0x3867104c0>
xg_model = pd.concat([xg, xg_feature1, xg_feature2, xg_feature3, xg_tuning, xg_feature1_tuning, xg_feature2_tuning, xg_feature3_tuning, xgboost_random])
xg_model
| mae | mse | rmse | |
|---|---|---|---|
| XGBoost_Train | 1062.784899 | 2.039445e+06 | 1428.091324 |
| XGBoost_Val | 2370.002169 | 1.150793e+07 | 3392.334072 |
| XGBoost_FEATURE1_Train | 1251.945221 | 2.949687e+06 | 1717.465394 |
| XGBoost_FEATURE1_Val | 2381.406002 | 1.206589e+07 | 3473.598919 |
| XGBoost_FEATURE2_Train | 1114.508146 | 2.326707e+06 | 1525.354768 |
| XGBoost_FEATURE2_Val | 2330.471706 | 1.189117e+07 | 3448.357665 |
| XGBoost_FEATURE3_Train | 1251.945221 | 2.949687e+06 | 1717.465394 |
| XGBoost_FEATURE3_Val | 2390.399367 | 1.221464e+07 | 3494.944348 |
| XGBoost_Tune_Train | 1998.737805 | 8.201365e+06 | 2863.802593 |
| XGBosst_Tune_Val | 2206.331138 | 8.736292e+06 | 2955.721868 |
| XGBoost_Tune_FEATURE1_Train | 1987.072765 | 8.215540e+06 | 2866.276384 |
| XGBosst_Tune_FEATURE1_Val | 2301.942297 | 1.115216e+07 | 3339.484760 |
| XGBoost_Tune_FEATURE2_Train | 1940.521103 | 7.589272e+06 | 2754.863299 |
| XGBosst_Tune_FEATURE2_Val | 2276.243249 | 1.131108e+07 | 3363.195538 |
| XGBoost_Tune_FEATURE3_Train | 1987.072765 | 8.215540e+06 | 2866.276384 |
| XGBosst_Tune_FEATURE3_Val | 2306.776818 | 1.116378e+07 | 3341.225026 |
| XGBoost_Random_Training | 1407.143534 | 4.645281e+06 | 2155.291443 |
| XGBoost_Random_Validating | 2326.201974 | 1.088273e+07 | 3298.897608 |
multi_model
| mae | mse | rmse | |
|---|---|---|---|
| Baseline_Train | 8307.422361 | 1.105862e+08 | 10515.995863 |
| Baseline_Test | 7751.203825 | 1.000556e+08 | 10002.777477 |
| MultiLinear_Train | 2079.423872 | 9.298125e+06 | 3049.282748 |
| MultiLinear_Val | 2267.698408 | 1.029542e+07 | 3208.647385 |
| MultiLinear_Feature1_Train | 2070.482628 | 9.344101e+06 | 3056.812263 |
| MultiLinear_Feature1_Val | 2230.827601 | 1.010146e+07 | 3178.280116 |
| MultiLinear_Feature2_Train | 2093.693198 | 1.030223e+07 | 3209.708360 |
| MultiLinear_Feature2_Val | 2123.415974 | 9.319224e+06 | 3052.740411 |
| MultiLinear_Feature3_Train | 2278.747762 | 1.223434e+07 | 3497.762131 |
| MultiLinear_Feature3_Val | 2124.895843 | 9.739218e+06 | 3120.772055 |
lasso_model
| mae | mse | rmse | |
|---|---|---|---|
| Lasso_Train | 2093.725380 | 9.353681e+06 | 3058.378796 |
| Lasso_Val | 2279.843057 | 1.024395e+07 | 3200.616371 |
| Lasso_FEATURE1_Train | 2278.114357 | 1.223520e+07 | 3497.885577 |
| Lasso_FEATURE1_Val | 2123.981346 | 9.719629e+06 | 3117.631893 |
| Lasso_FEATURE2_Train | 2092.325287 | 1.030345e+07 | 3209.898411 |
| Lasso_FEATURE2_Val | 2117.264295 | 9.286441e+06 | 3047.366207 |
| Lasso_FEATURE3_Train | 2279.940306 | 1.223561e+07 | 3497.943503 |
| Lasso_FEATURE3_Val | 2127.655356 | 9.759305e+06 | 3123.988600 |
ridge_model
| mae | mse | rmse | |
|---|---|---|---|
| Ridge_Train | 2061.966576 | 9.326546e+06 | 3053.939442 |
| Ridge_Val | 2211.667197 | 9.930921e+06 | 3151.336341 |
| Ridge_FEATURE1_Train | 2274.841596 | 1.225304e+07 | 3500.433978 |
| Ridge_FEATURE1_Val | 2108.951082 | 9.600463e+06 | 3098.461357 |
| Ridge_FEATURE2_Train | 2084.101832 | 1.031979e+07 | 3212.443623 |
| Ridge_FEATURE2_Val | 2102.691899 | 9.118768e+06 | 3019.729731 |
| Ridge_FEATURE3_Train | 2278.747159 | 1.223434e+07 | 3497.762131 |
| Ridge_FEATURE3_Val | 2124.893956 | 9.739202e+06 | 3120.769408 |
elastic_model
| mae | mse | rmse | |
|---|---|---|---|
| Elastic_Train | 2284.880425 | 1.238910e+07 | 3519.815887 |
| Elastic_Val | 2073.707562 | 8.579011e+06 | 2928.994854 |
| Elastic_FEATURE1_Train | 2469.341627 | 1.406453e+07 | 3750.270443 |
| Elastic_FEATURE1_Val | 2185.184604 | 9.342266e+06 | 3056.512135 |
| Elastic_FEATURE2_Train | 2321.397873 | 1.289433e+07 | 3590.867219 |
| Elastic_FEATURE2_Val | 2109.088513 | 8.697749e+06 | 2949.194714 |
| Elastic_FEATURE3_Train | 2279.672369 | 1.223548e+07 | 3497.925509 |
| Elastic_FEATURE3_Val | 2126.979548 | 9.753365e+06 | 3123.037799 |
dt_model
| mae | mse | rmse | |
|---|---|---|---|
| Dtregressor_Train | 0.000000 | 0.000000e+00 | 0.000000 |
| Dtregressor_Val | 3022.997923 | 1.981871e+07 | 4451.821535 |
| Dtregressor_FEATURE1_Train | 0.000000 | 0.000000e+00 | 0.000000 |
| Dtregressor_FEATURE1_Val | 3142.906175 | 2.105998e+07 | 4589.114999 |
| Dtregressor_FEATURE2_Train | 0.000000 | 0.000000e+00 | 0.000000 |
| Dtregressor_FEATURE2_Val | 3379.599290 | 2.756409e+07 | 5250.151896 |
| Dtregressor_FEATURE3_Train | 0.000000 | 0.000000e+00 | 0.000000 |
| Dtregressor_FEATURE3_Val | 3090.758525 | 2.071248e+07 | 4551.096100 |
| DT_Tune_Train | 2428.632887 | 1.302080e+07 | 3608.435026 |
| DT_Tune_Val | 2377.058551 | 1.166857e+07 | 3415.929063 |
| DT_Tune_FEATURE1_Train | 2634.547381 | 1.511255e+07 | 3887.486135 |
| DT_Tune_FEATURE1_Val | 2843.669581 | 1.846317e+07 | 4296.879298 |
| DT_Tune_FEATURE2_Train | 2640.609970 | 1.566922e+07 | 3958.435982 |
| DT_Tune_FEATURE2_Val | 2532.673206 | 1.454901e+07 | 3814.316998 |
| DT_Tune_FEATURE3_Train | 2634.547381 | 1.511255e+07 | 3887.486135 |
| DT_Tune_FEATURE3_Val | 2855.303550 | 1.849346e+07 | 4300.402255 |
rf_model
| mae | mse | rmse | |
|---|---|---|---|
| RFregressor_Train | 952.337505 | 2.045942e+06 | 1430.364188 |
| RFregressor_Val | 2203.343703 | 9.785882e+06 | 3128.239434 |
| RFregressor_FEATURE1_Train | 969.471965 | 2.116189e+06 | 1454.712785 |
| RFregressor_FEATURE1_Val | 2187.242154 | 9.871982e+06 | 3141.970979 |
| RFregressor_FEATURE2_Train | 954.917457 | 2.064596e+06 | 1436.870097 |
| RFregressor_FEATURE2_Val | 2206.446608 | 9.923375e+06 | 3150.138845 |
| RFregressor_FEATURE3_Train | 969.307015 | 2.117008e+06 | 1454.994201 |
| RFregressor_FEATURE3_Val | 2190.343906 | 9.880833e+06 | 3143.379168 |
| RF_Tune_Train | 2163.637111 | 1.120318e+07 | 3347.114556 |
| RF_Tune_Val | 2368.454600 | 1.100253e+07 | 3317.006576 |
| RF_Tune_FEATURE1_Train | 2217.155423 | 1.176883e+07 | 3430.572444 |
| RF_Tune_FEATURE1_Val | 2406.069268 | 1.134822e+07 | 3368.712046 |
| RF_Tune_FEATURE2_Train | 2162.109950 | 1.127638e+07 | 3358.032533 |
| RF_Tune_FEATURE2_Val | 2389.063839 | 1.112963e+07 | 3336.109354 |
| RF_Tune_FEATURE3_Train | 2216.333066 | 1.176684e+07 | 3430.283430 |
| RF_Tune_FEATURE3_Val | 2403.261234 | 1.134797e+07 | 3368.674639 |
xg_model
| mae | mse | rmse | |
|---|---|---|---|
| XGBoost_Train | 1062.784899 | 2.039445e+06 | 1428.091324 |
| XGBoost_Val | 2370.002169 | 1.150793e+07 | 3392.334072 |
| XGBoost_FEATURE1_Train | 1251.945221 | 2.949687e+06 | 1717.465394 |
| XGBoost_FEATURE1_Val | 2381.406002 | 1.206589e+07 | 3473.598919 |
| XGBoost_FEATURE2_Train | 1114.508146 | 2.326707e+06 | 1525.354768 |
| XGBoost_FEATURE2_Val | 2330.471706 | 1.189117e+07 | 3448.357665 |
| XGBoost_FEATURE3_Train | 1251.945221 | 2.949687e+06 | 1717.465394 |
| XGBoost_FEATURE3_Val | 2390.399367 | 1.221464e+07 | 3494.944348 |
| XGBoost_Tune_Train | 1998.737805 | 8.201365e+06 | 2863.802593 |
| XGBosst_Tune_Val | 2206.331138 | 8.736292e+06 | 2955.721868 |
| XGBoost_Tune_FEATURE1_Train | 1987.072765 | 8.215540e+06 | 2866.276384 |
| XGBosst_Tune_FEATURE1_Val | 2301.942297 | 1.115216e+07 | 3339.484760 |
| XGBoost_Tune_FEATURE2_Train | 1940.521103 | 7.589272e+06 | 2754.863299 |
| XGBosst_Tune_FEATURE2_Val | 2276.243249 | 1.131108e+07 | 3363.195538 |
| XGBoost_Tune_FEATURE3_Train | 1987.072765 | 8.215540e+06 | 2866.276384 |
| XGBosst_Tune_FEATURE3_Val | 2306.776818 | 1.116378e+07 | 3341.225026 |
| XGBoost_Random_Training | 1407.143534 | 4.645281e+06 | 2155.291443 |
| XGBoost_Random_Validating | 2326.201974 | 1.088273e+07 | 3298.897608 |
The best model is XGboost_Tune:
xg_tuning
| mae | mse | rmse | |
|---|---|---|---|
| XGBoost_Tune_Train | 1998.737805 | 8.201365e+06 | 2863.802593 |
| XGBosst_Tune_Val | 2206.331138 | 8.736292e+06 | 2955.721868 |
xg_regressor = GradientBoostingRegressor(random_state=42, learning_rate=0.3, n_estimators=10, max_depth=3, min_samples_split=5, min_samples_leaf=1, max_features=10).fit(X_train, y_train)
y_preds_train = xg_regressor.predict(X_train)
y_preds_val = xg_regressor.predict(X_val)
y_preds_test = xg_regressor.predict(X_test)
mse = mean_squared_error(y_preds_train, y_train)
mae = mean_absolute_error(y_preds_train, y_train)
rmse = mean_squared_error(y_preds_train, y_train, squared=False)
xgboost_train = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=['XGBoost_Training'])
mse = mean_squared_error(y_preds_val, y_val)
mae = mean_absolute_error(y_preds_val, y_val)
rmse = mean_squared_error(y_preds_val, y_val, squared=False)
xgboost_val = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=['XGBoost_Validating'])
mse = mean_squared_error(y_preds_test, y_test)
mae = mean_absolute_error(y_preds_test, y_test)
rmse = mean_squared_error(y_preds_test, y_test, squared=False)
xgboost_test = pd.DataFrame({'mae': mae,
'mse': mse,
'rmse': rmse},
index=['XGBoost_Testing'])
xgboost_models = pd.concat([xgboost_train, xgboost_val, xgboost_test])
xgboost_models
| mae | mse | rmse | |
|---|---|---|---|
| XGBoost_Training | 1998.737805 | 8.201365e+06 | 2863.802593 |
| XGBoost_Validating | 2206.331138 | 8.736292e+06 | 2955.721868 |
| XGBoost_Testing | 2589.837132 | 1.261266e+07 | 3551.430541 |
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(20, 6))
axes[0].plot(y_train, y_preds_train, 'o',
color='orange', label='Predictions')
axes[0].plot(y_train, y_train, '-', color='red', label='Actual')
axes[0].set_xlabel('Actual')
axes[0].set_ylabel('Predicted')
axes[0].set_title('XGBoost_Train: Comparison of Actual vs. Predicted Target')
axes[0].legend()
axes[1].plot(y_val, y_preds_val, 'o',
color='orange', label='Predictions')
axes[1].plot(y_val, y_val, '-', color='red', label='Actual')
axes[1].set_xlabel('Actual')
axes[1].set_ylabel('Predicted')
axes[1].set_title('XGBoost_Val: Comparison of Actual vs. Predicted Target')
axes[1].legend()
axes[2].plot(y_test, y_preds_test, 'o',
color='orange', label='Predictions')
axes[2].plot(y_test, y_test, '-', color='red', label='Actual')
axes[2].set_xlabel('Actual')
axes[2].set_ylabel('Predicted')
axes[2].set_title('XGBoost_Test: Comparison of Actual vs. Predicted Target')
axes[2].legend()
plt.show()
The evaluation was performed using the test sets to assess the models' performance of how well the models generalised to unseen data.
Among the models evaluated, the Gradient Boosting Regressor demonstrated the best performance after hyperparameter tuning. It achieved an MAE of 1998.74, 2206.3, and 2589.84 and an RMSE of 2863.8, 2955.72, and 3551.43 on the training, validation, and testing sets, respectively.
Overall, the evaluation confirmed that the Gradient Boosting Regressor effectively predicted the next month's spending, with low MAE and RMSE values and a strong alignment between predicted and actual values in the scatter plot.